Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I type "ls", I'm getting a text-serialized list of objects. Why can't I just get the (serializable) list directly? So that I don't have to mess with (implicitly) converting text back to objects, often involving regex and hacks.


Because the objects are probably either relatively deeply encoded inside hundreds of plain-C stack and heap locations, or they're not even fully resident in RAM anymore by the time output occurs.

It's not that uncommon for ol' C hackers to directly write those stack and heap locations out to disk and call it a "file format." Trouble is, you're almost entirely at the whims of your platform and compiler as to what the actual layout of that is.

If you're thinking, "well that's dumb, why doesn't C have a standardized representation for those in-memory objects that hides platform differences", it does: printf and scanf.

Text isn't necessarily a great answer to that problem, but it definitely is an answer. Others include packed structs with htonl and friends and low-overhead serialization formats like protobuf, Thrift, and Avro. Inside, say, Google, you have "everything is a protobuf" instead of "everything is text," and it does end up working roughly as well as you might expect. That is to say, reasonably well, but with its own sets of problems that people won't ever stop complaining about.


You are not supposed to parse "ls" (unless this is "ls -1"), because that format is only for humans, and defaults change all the time.

If you are parsing "ls -l" output, you are doing something wrong.

Use your language built-in features, every language has them (for example in bash, use *-expansion and [-commands). If they don't work for some reason, there is "stat -c" and "find .. -printf", which both produce text which is absolutely trivial to parse.


The reason you're not supposed to parse ls output doesn't have to do with the default formatting; I'm quite sure all widely used implementations of ls have always printed the equivalent of "ls -1" when the output is directed to a non-tty. The actual problem is that UN*X paths can contain all printable characters, including newlines, so if you don't plan to place additional restrictions on the file names you support, you can't at any point parse ls.

Of course, ls does more than just list file names so it can be tempting to utilise its features. coreutils ls has (relatively) recently received an additional output format that can be unambiguously parsed, but that's as far as the portability goes.


Like with Git, the usual set of common *nixy tools (coreutils as well as certain shell built-ins) contains tools in both the 'porcelain' (meant mostly for human consumption) and 'plumbing' (meant mostly for scripting and constructing pipelines) categories. Perhaps tutorials and reference documentation should place more emphasis on which is which, and why. Using the right tool for the job leads to better results and less frustration.


> If you are parsing "ls -l" output, you are doing something wrong.

> Use your language built-in features

The issue is that current shells _don't_ have a built-in structured data type for e.g. a `struct stat`. But why not?

`ls` calls some APIs and gets some in-memory `struct stat`s populated by the kernel. Then it throws away 90% of the data that the kernel copied to user space and then serializes it as text. Why not pass the structs themselves to the next process? We can't currently (except with powershell?), so you have to write actual code.

This is a "bright line" between code and shells that could very well be blurred, but it would take an agreed-upon serialization format for posix-y data structures.


One of the big advantages of shell is exploratory nature -- you write pipelines one step after another, looking at the intermediate result at each step. Any sort of complex serialization will break this.

This is why in shell, if you want programmatic "stat" output, you use "stat" tool, not "ls". "ls" prints all at once. "stat" has a custom output format, which means you print exact fields you want, so your intermediate results are concise and readable. And since every element of the "struct stat" is an integer, a simple space-separated format works very well.

(that said, I would not mind seeing more tools print JSON or JSONlines. I add this functionality to many of the tools I write, and it is pretty powerful in conjunction with jq)


If you are in a situation where you are parsing the output of ls you are already writing code, and a shell designed for interactive use is a terrible tool for that job. Use something else. There are many many more better high level programming or scripting languages, which makes it both easier to write parsing code and removes most need to do so in the first place, since the apis usually give you structured data in the first place.


You're looking for nushell:

https://github.com/nushell/nushell


Wow, that's pretty great, I'm going to try that out.


> When I type "ls", I'm getting a text-serialized list of objects. Why can't I just get the (serializable) list directly?

Because then the output of ls can be used by programs that don't know what a list is, or what a file is; and this is breathtakingly beautiful.


Make it auto-serialize to text if the receiving end doesn't understand lists. Working with text directly doesn't get me any type safety, or any safety at all (unless I'm writing a one-off, where it matters less).


"type safety" is the "structured programming" of the twenties


...As in, it will become so universal that languages without it are utterly unheard of, except for assembly and joke languages?


Ok cool I can pipe ls into gz and pipe further into scp. Now what? How do I use it on the receiving end after unzipping it again? You still have to parse it into a structural list of files over there somehow to make use of it? Not everything is a tunnel.

Structural data can still be serialized easily, take json for example which’s spec fit on a napkin.


You can. Use your favorite programming language's interface for listing the content of a directory!

`ls` is for you, the human.


In C, my favourite language, I would do this:

        system("ls");


Please never do this; do not use system, it executes $SHELL and you can fall victim to various PATH munging attacks. :)


As I've mentioned in other discussions here, I find jc useful for exactly this purpose -- https://github.com/kellyjonbrazil/jc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: