Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Efficiency looks past current deficiency.

We have the empty string: "\0"

We have the null string: NULL

There is no concept of an INVALID string, as float has NAN.

This would be the result of trying to copy a string to a buffer that is too small.

Or sprintf() into a small buffer.

Or a raw string parsed as UTF-8 and is invalid.

Correctness over efficiency.



I'd argue that an invalid string concept would be neither correct nor efficient. Why should all code that deals with strings carry the burden of fallibility of a subset of string functions?

You've mentioned NaN propagation in another comment and I think that's a perfect example of the problem with this approach. Sorting a vector of arbitrary floats is a notoriously thorny problem because any float could be NaN, and as NaN is incomparable to any other float, there is no total ordering of floats. There is no general solution to this problem that doesn't involve making assumptions that could be faulty for some applications.


Please support your argument against correctness by providing an example where an INVALID string as input to a suitable modified generic string function would result in a valid string.


What is length of an invalid string? What is the length of the concatenation of two invalid strings?

There are sensible answers. But they are weird.


Is it more sensible to cat 2 strings, but cut off the second one, then pass off the result as valid?

I would say let an INVALID string be length 0. Then accept that catting a valid and invalid string would result in a shorter length.

Which one do you think is safer?


I would expect an invalid string to have an invalid length. For integer-valued lengths you'd have to use a negative number to differentiate from a valid, empty string. But then the sum of the invalid-string lengths differs from the length of the concatenated invalid strings. Which is wonky.


Safe string manipulation never exceeds the bounds of the buffer. So negative values are dangerous, as all as any additions that would exceed the maximum size.

Negative lengths are not compatible with unsigned representation.

A system implementing invalid string values must choose a text encoding such as UTF-8 that supports the concept of an invalid character. Null termination is too flexible. As such is simple length prepending.


It's not an "argument against correctness" it's an argument to what you are proposing


I don't understand the fallibility. Clearly misuse of string functions is epidemic. A propagating INVALID string result makes it very clear there is a logic error and not an exploit.

I understand how one could shoot down implementations, but none has made a convincing argument about shooting down the idea.


I wouldn't prefer one more special case to test against (empty string / null string / 'invalid' string). Why can't those operations just return error codes instead? How about memcpy if you try to memcpy into a buffer that's too small - it writes an 'invalid buffer' type instead?


Propagating NAN is an elegant method in floating point and makes sense for well defined string encodings like UTF-8.

memcpy and company are strictly for raw unencoded buffers.


”This would be the result of trying to copy a string to a buffer that is too small.”

C doesn’t have the notion of “size of buffer” (yes, arrays have a size that can be queried by sizeof, but only at compile time). You would have to fix that, first.


> There is no concept of an INVALID string, as float has NAN.

Isn’t that just NULL?


NULL is the lack of any string. If one view a string as a result of an operation, then an INVALID string is the consequence of bad input to an operation.


Why can’t NULL serve as the invalid string in this case? It’s clearly not a valid string that an operation will return.


If you have studied Computer Science, you should know that the null string is quite a valid string.

Let's take strstr, which finds a matching substring needle in a haystack string.

-returns a NULL string if the needle is not in the haystack. -returns pointer to first matching substring.

Extend strstr with VALIDITY

Understood behaviour if both are valid.

Say the haystack is INVALID...as the return value is NULL or a strict substring of haystack, should return INVALID. A poison haystack should poison dependent strings.

Say the haystack is valid but the needle is INVALID...should return NULL. A valid string never contains an INVALID string as a subsequence.


Here's my behavior:

  strstr(NULL, /* valid string */)
I can't find the needle in the haystack (actually, I can't find anything in the haystack. I can't find the haystack.) Thus I return NULL.

  strstr(/* valid string */, NULL)
I can't find the needle in the haystack (actually, I wouldn't be able to find it: I don't know what I'm looking for.) Return NULL.


You're explanation is not inconsistent with my proposal, but you don't seem to grasp VALIDITY.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: