The same information is already available in a machine–readable format. Just call readdir. You don’t need to run ls, have ls call readdir and convert the output into JSON, and then finally parse the JSON back into a data structure. You can just call readdir!
You’re still doing unnecessary work. You’re turning a list of files into a string, then parsing the string back into words.
Your shell already provides a nice abstraction over calling readdir directly. A glob gives you a list, with no intermediate stage as a string that needs to be parsed. You can iterate directly over that list.
Every language provides either direct access to the C library, so that you can call readdir, or it provides some abstraction over it to make the process less annoying. In Common Lisp the function `directory` takes a pathname and returns a list of pathnames for the files in the named directory. In Rust there is the `std::fs::read_dir` that gives you an iterator that yields `io::Result<std::fs::DirEntry>`, allowing easy handling of io errors and also neatly avoiding an extra allocation. Raku has a function `dir` that returns a similar iterator, but with the added feature that it can match the names against a regex for you and only yield the matches. You can fill in more examples from your favorite languages if you want.
There is a glob() function you can use in POSIX C also to get an array of strings.
The getdents system call being used in the above program is the basis for implementing readdir.
It doesn't return a string, but rather a buffer of multiple directory entries.
The program isn't parsing a giant string; it is parsing out the directory entry structures, which are variable length and have a length field so the next one can be found.
The program writes each name including the null terminator, so that the output is suitable for utilities which understand that.
The problem is the phrase “suitable for shell pipelines”. If you are in a shell, you should not be doing anything like this. You should use a glob directly in the shell. You should not be calling an external program, having that program print out something, and then parsing it. Just use a glob right there in your shell script. If you do anything else, you are doing it wrong.
Right, globs are syntactic sugar on top of readdir. Definitely use them when you are in a shell. But in general the solution is to call readdir, or some language facility built directly on top of it. Calling ls and asking it for JSON is the stupid way to do things.
It doesn't require trying to organize a small revolution across dozens of GNU tools, many authors, and numerous distros...?
I'd love to see standard JSON output across these tools. I just don't see a realistic way to get that to happen in my lifetime.
Maybe a unified parsing layer is more realistic, like an open source command output to JSON framework that would automatically identify the command variant you're running based on its version and your shell settings, parse the output for you, and format it in a standard JSON schema? Even that would be a huge undertaking though.
There are a lot, LOT of command variants out there. It's one thing to tweak the output to make it parseable for your one-off script on your specific machine. Not so easy to make it reusable across the entire *nix world.
With regards to parted, if you only want to query for information, there is "partx" whose output was purposefully designed to be parsed. I have good experiences with it.
That doesn't solve the problem that bash is completely useless for manipulating JSON.
It certainly would make writing Python scripts that need to interact with other programs easier. But Python doesn't desperately NEED to interact with so many other programs for such simple tasks like enumerating files or making http requests or parsing json, the way bash does.
Then you have to install the new version of bash on every system you depend on json parsing, negating the argument that bash is installed everywhere.
If bash was ever actually going to get json parsing in reality, it should have done that two decades ago like all the other scripting languages, since JSON is 23 years old. So don't hold your breath.