I'd argue that an invalid string concept would be neither correct nor efficient. Why should all code that deals with strings carry the burden of fallibility of a subset of string functions?
You've mentioned NaN propagation in another comment and I think that's a perfect example of the problem with this approach. Sorting a vector of arbitrary floats is a notoriously thorny problem because any float could be NaN, and as NaN is incomparable to any other float, there is no total ordering of floats. There is no general solution to this problem that doesn't involve making assumptions that could be faulty for some applications.
Please support your argument against correctness by providing an example where an INVALID string as input to a suitable modified generic string function would result in a valid string.
I would expect an invalid string to have an invalid length. For integer-valued lengths you'd have to use a negative number to differentiate from a valid, empty string. But then the sum of the invalid-string lengths differs from the length of the concatenated invalid strings. Which is wonky.
Safe string manipulation never exceeds the bounds of the buffer. So negative values are dangerous, as all as any additions that would exceed the maximum size.
Negative lengths are not compatible with unsigned representation.
A system implementing invalid string values must choose a text encoding such as UTF-8 that supports the concept of an invalid character. Null termination is too flexible. As such is simple length prepending.
I don't understand the fallibility.
Clearly misuse of string functions is epidemic.
A propagating INVALID string result makes it very clear there is a logic error and not an exploit.
I understand how one could shoot down implementations, but none has made a convincing argument about shooting down the idea.
I wouldn't prefer one more special case to test against (empty string / null string / 'invalid' string). Why can't those operations just return error codes instead? How about memcpy if you try to memcpy into a buffer that's too small - it writes an 'invalid buffer' type instead?
”This would be the result of trying to copy a string to a buffer that is too small.”
C doesn’t have the notion of “size of buffer” (yes, arrays have a size that can be queried by sizeof, but only at compile time). You would have to fix that, first.
NULL is the lack of any string.
If one view a string as a result of an operation, then an INVALID string is the consequence of bad input to an operation.
If you have studied Computer Science, you should know that the null string is quite a valid string.
Let's take strstr, which finds a matching substring needle in a haystack string.
-returns a NULL string if the needle is not in the haystack.
-returns pointer to first matching substring.
Extend strstr with VALIDITY
Understood behaviour if both are valid.
Say the haystack is INVALID...as the return value is NULL or a strict substring of haystack, should return INVALID. A poison haystack should poison dependent strings.
Say the haystack is valid but the needle is INVALID...should return NULL. A valid string never contains an INVALID string as a subsequence.
We have the empty string: "\0"
We have the null string: NULL
There is no concept of an INVALID string, as float has NAN.
This would be the result of trying to copy a string to a buffer that is too small.
Or sprintf() into a small buffer.
Or a raw string parsed as UTF-8 and is invalid.
Correctness over efficiency.