Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like Unix utilities should provide a standardized way to generate machine-readable output, perhaps using JSON.


The same information is already available in a machine–readable format. Just call readdir. You don’t need to run ls, have ls call readdir and convert the output into JSON, and then finally parse the JSON back into a data structure. You can just call readdir!


I know, but it would be so great if __every__ Unix utility just had the same type of output. By the way, ls does more than just readdir.


Can you call readdir() from a shell easily?

WRT format, I'd prefer csv.


here is trivial program to dump dents to stdout, suitable for shell pipelines. example usage `./getdents64 . | xargs -0 printf "%q\n"`

    #define _GNU_SOURCE
    #include <dirent.h>
    #include <fcntl.h>
    #include <malloc.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>

    #define BUF_SIZE 32768

    struct linux_dirent64 {
      ino64_t d_ino;           /* 64-bit inode number */
      off64_t d_off;           /* Not an offset; see getdents() */
      unsigned short d_reclen; /* Size of this dirent */
      unsigned char d_type;    /* File type */
      char d_name[];           /* Filename (null-terminated) */
    };

    int writeall(char *buf, size_t len) {
      ssize_t wres = 0;
      wres = write(1, buf, len);
      if (wres == -1) {
        perror("write");
        return -1;
      }
      if (((size_t)wres) < len) {
        return writeall(buf + wres, len - wres);
      }
      return 0;
    }

    int main(int argc, char **argv) {
      if (argc != 2) {
        return EXIT_FAILURE;
      }
      int fd = open(argv[1], O_DIRECTORY | O_RDONLY);
      if (fd == -1) {
        perror("open");
        return EXIT_FAILURE;
      }
      void *buf = malloc(BUF_SIZE);
      ssize_t res = 0;
      do {
        res = getdents64(fd, buf, BUF_SIZE);
        if (res == -1) {
          perror("getdents64");
          return EXIT_FAILURE;
        }
        void *it = buf;
        while (it < (buf + res)) {
          struct linux_dirent64 *elem = it;
          it += elem->d_reclen;
          size_t len = strlen(elem->d_name);
          if (writeall(elem->d_name, len + 1) == -1) {
            return EXIT_FAILURE;
          }
        }
      } while (res > 0);
      return EXIT_SUCCESS;
    }


You’re still doing unnecessary work. You’re turning a list of files into a string, then parsing the string back into words.

Your shell already provides a nice abstraction over calling readdir directly. A glob gives you a list, with no intermediate stage as a string that needs to be parsed. You can iterate directly over that list.

Every language provides either direct access to the C library, so that you can call readdir, or it provides some abstraction over it to make the process less annoying. In Common Lisp the function `directory` takes a pathname and returns a list of pathnames for the files in the named directory. In Rust there is the `std::fs::read_dir` that gives you an iterator that yields `io::Result<std::fs::DirEntry>`, allowing easy handling of io errors and also neatly avoiding an extra allocation. Raku has a function `dir` that returns a similar iterator, but with the added feature that it can match the names against a regex for you and only yield the matches. You can fill in more examples from your favorite languages if you want.


There is a glob() function you can use in POSIX C also to get an array of strings.

The getdents system call being used in the above program is the basis for implementing readdir.

It doesn't return a string, but rather a buffer of multiple directory entries.

The program isn't parsing a giant string; it is parsing out the directory entry structures, which are variable length and have a length field so the next one can be found.

The program writes each name including the null terminator, so that the output is suitable for utilities which understand that.


The problem is the phrase “suitable for shell pipelines”. If you are in a shell, you should not be doing anything like this. You should use a glob directly in the shell. You should not be calling an external program, having that program print out something, and then parsing it. Just use a glob right there in your shell script. If you do anything else, you are doing it wrong.

Do I really have to say this again?


Wow, these replies. I was being a little sarcastic as there is no 'readdir' shell command. That is all.


Certainly. Just do `for f in *`. See how easy that is?


`find` is also an option, or shell globs.


Right, globs are syntactic sugar on top of readdir. Definitely use them when you are in a shell. But in general the solution is to call readdir, or some language facility built directly on top of it. Calling ls and asking it for JSON is the stupid way to do things.


Just curious, how would you approach getting output from utilities like "df", "mount" and "parted"?


Generally speaking, can't you limit/define the output of those commands and parse them that way? like df --portability or --total or --output

And/or use their return codes to verify that something worked or didn't

Or hope your higher level programming language contains built-ins for file system manipulations


How is that any easier than just giving a standardized --json flag?


It doesn't require trying to organize a small revolution across dozens of GNU tools, many authors, and numerous distros...?

I'd love to see standard JSON output across these tools. I just don't see a realistic way to get that to happen in my lifetime.

Maybe a unified parsing layer is more realistic, like an open source command output to JSON framework that would automatically identify the command variant you're running based on its version and your shell settings, parse the output for you, and format it in a standard JSON schema? Even that would be a huge undertaking though.

There are a lot, LOT of command variants out there. It's one thing to tweak the output to make it parseable for your one-off script on your specific machine. Not so easy to make it reusable across the entire *nix world.


With regards to parted, if you only want to query for information, there is "partx" whose output was purposefully designed to be parsed. I have good experiences with it.


That doesn't solve the problem that bash is completely useless for manipulating JSON.

It certainly would make writing Python scripts that need to interact with other programs easier. But Python doesn't desperately NEED to interact with so many other programs for such simple tasks like enumerating files or making http requests or parsing json, the way bash does.


Bash is useless at JSON now. There's nothing stopping Bash from introducing native JSON parsing.


Then you have to install the new version of bash on every system you depend on json parsing, negating the argument that bash is installed everywhere.

If bash was ever actually going to get json parsing in reality, it should have done that two decades ago like all the other scripting languages, since JSON is 23 years old. So don't hold your breath.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: