Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree filepath related tasks are ugly. But there are a number of reasons for that that aren't related to zero termination. First, there is syntax & semantics of filepaths. Strings (whatever kind, just thinking about their monoidic structure) are a convenient user interface for specifying filepath constants, but they're annoying to construct from, and disassemble into, filepath components programmatically (relative to how easy I think it should be). Because of complicated syntax and especially semantics of components and paths, there are a lot of pitfalls. Filepath handling is most conveniently done in the shell, where also nobody has any illusion about it being fragile.

Second, you're talking about memory allocation, and this is arguably orthogonal to the string representations we're discussing here. Whether you make a copy or not for example totally depends on your specific situation. The same considerations arise for any array or slice type.

Third, again, you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution. I could even agree that format strings should have better standardized support for explicit length, but it's really not a pain point for me. I'm only stating that zero-terminated is an acceptable default for string literals, and I want to stress this with another example: Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators? This argument can also extend to runtime debugging somewhat.



> memory allocation, and this is arguably orthogonal to the string representations

A substringz cannot be produced from a stringz without doing an allocation.

> you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution

Right, I can. And it's an ongoing nuisance in C to do so, because it doesn't have proper abstractions to build new types with. Even worse, if I switch my stringz to length delimited, and then pass it to fopen() which wants a stringz, I have to convert my length delimited string to stringz even though it is already a stringz. Because my length delimited API has no mechanism to say it also is 0 terminated.

You wind up with two string representations in your code, and then what? Have each string function come in a pair?

Believe me, I've done this stuff, I've thought about it a lot, and there is no happy solution. It annoys me enough that C is just not a tool I want to reach for anymore. I'm just tired of ugly, buggy C string code.

The good news is there is a fix, and I've proposed it, but it gets zero traction:

https://www.digitalmars.com/articles/C-biggest-mistake.html


> You wind up with two string representations in your code, and then what? Have each string function come in a pair?

As said, I don't think this is the end of the world, and I'm likely to add a number of other string representations. While it happens rarely, I don't worry about formatting a temporary string for an API into a temporary before calling it. Because most "string" things are small and dispensable. Zero-terminated strings are the cheap plastic solution that just works for submitting string-literals to printf, and that just works to view directly in a binary. And they're compatible with length delineated in the sense that you can supply a (cheap plastic) zero-terminated string to a (more serious) length delineated API. Also the other way, many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

> The good news is there is a fix, and I've proposed it, but it gets zero traction

I'm aware of this and I like it ("fat pointers") but I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.


> many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

I'm sorry, I just have to say "no thanks" to that. I don't really want each string function to test the length and run strlen if it isn't there.

By now, the D community has 20 years experience with length as part of the string type. Nobody wants to go back to the C way. It's probably the most unambiguously successful and undisputed feature of D. C code that gets converted to D gets scrubbed of the stringz code, and the result is cleaner and faster.

D still interfaces with C and C strings. The conversion is done as the last step before calling the C function. (There's a clever way to add a 0 that only rarely requires an allocation.) Any C strings returned get immediately converted with the slice idiom:

    string s = p[0 .. strlen(p)];
> I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.

I bet you would like it! (Another problem with a separate length field is there's no obvious connection between it and the string - which is another source of bugs.)


> Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators?

Not perceptibly better. And yeah, I do look at binary dumps now and then, after all, I wrote the code that generates ELF, OMF, MachO, and MSCOFF object file formats, and librarians for them :-)


I wrote simple ELF and PE/COFF writers too, but independently of that, zero terminators are what lets you find strings in a binary. And what allows the "strings" program to function. It simply couldn't work with without those terminators.

Similarly, the text we're exchanging consists of words and sentences that are terminated using not zero bytes, but other terminators. I'm very happy that they're not length delineated.


> It simply couldn't work with without those terminators.

Yeah, it will. For a related example, I use `grep` all the time to find strings in source code. Source code is not 0 terminated. It works fine.


I use "grep -w foo" (or something like "grep '\<foo\>'"), because when I look for "foo" I don't want "bazfoobar". grep -w only works because the end of words is signaled in-band (surrounding / terminating words with whitespace).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: