I agree filepath related tasks are ugly. But there are a number of reasons for t...

WalterBright · on Aug 10, 2022

> memory allocation, and this is arguably orthogonal to the string representations

A substringz cannot be produced from a stringz without doing an allocation.

> you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution

Right, I can. And it's an ongoing nuisance in C to do so, because it doesn't have proper abstractions to build new types with. Even worse, if I switch my stringz to length delimited, and then pass it to fopen() which wants a stringz, I have to convert my length delimited string to stringz even though it is already a stringz. Because my length delimited API has no mechanism to say it also is 0 terminated.

You wind up with two string representations in your code, and then what? Have each string function come in a pair?

Believe me, I've done this stuff, I've thought about it a lot, and there is no happy solution. It annoys me enough that C is just not a tool I want to reach for anymore. I'm just tired of ugly, buggy C string code.

The good news is there is a fix, and I've proposed it, but it gets zero traction:

https://www.digitalmars.com/articles/C-biggest-mistake.html

jstimpfle · on Aug 10, 2022

> You wind up with two string representations in your code, and then what? Have each string function come in a pair?

As said, I don't think this is the end of the world, and I'm likely to add a number of other string representations. While it happens rarely, I don't worry about formatting a temporary string for an API into a temporary before calling it. Because most "string" things are small and dispensable. Zero-terminated strings are the cheap plastic solution that just works for submitting string-literals to printf, and that just works to view directly in a binary. And they're compatible with length delineated in the sense that you can supply a (cheap plastic) zero-terminated string to a (more serious) length delineated API. Also the other way, many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

> The good news is there is a fix, and I've proposed it, but it gets zero traction

I'm aware of this and I like it ("fat pointers") but I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.

WalterBright · on Aug 10, 2022

> many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

I'm sorry, I just have to say "no thanks" to that. I don't really want each string function to test the length and run strlen if it isn't there.

By now, the D community has 20 years experience with length as part of the string type. Nobody wants to go back to the C way. It's probably the most unambiguously successful and undisputed feature of D. C code that gets converted to D gets scrubbed of the stringz code, and the result is cleaner and faster.

D still interfaces with C and C strings. The conversion is done as the last step before calling the C function. (There's a clever way to add a 0 that only rarely requires an allocation.) Any C strings returned get immediately converted with the slice idiom:

    string s = p[0 .. strlen(p)];

> I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.

I bet you would like it! (Another problem with a separate length field is there's no obvious connection between it and the string - which is another source of bugs.)

WalterBright · on Aug 10, 2022

> Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators?

Not perceptibly better. And yeah, I do look at binary dumps now and then, after all, I wrote the code that generates ELF, OMF, MachO, and MSCOFF object file formats, and librarians for them :-)

jstimpfle · on Aug 10, 2022

I wrote simple ELF and PE/COFF writers too, but independently of that, zero terminators are what lets you find strings in a binary. And what allows the "strings" program to function. It simply couldn't work with without those terminators.

Similarly, the text we're exchanging consists of words and sentences that are terminated using not zero bytes, but other terminators. I'm very happy that they're not length delineated.

WalterBright · on Aug 10, 2022

> It simply couldn't work with without those terminators.

Yeah, it will. For a related example, I use `grep` all the time to find strings in source code. Source code is not 0 terminated. It works fine.

jstimpfle · on Aug 10, 2022

I use "grep -w foo" (or something like "grep '\<foo\>'"), because when I look for "foo" I don't want "bazfoobar". grep -w only works because the end of words is signaled in-band (surrounding / terminating words with whitespace).