Learn Awk with Emacs (2020)

beepbooptheory · on Feb 19, 2022

I have started to use org babel for a lot of things these days. To just write little experiments, or do ephemeral tasks. This is especially useful when I want to operate on a db and remember the query, or have an org file that is like dashboard of views into a db, with lots of notes and links to other notes.

I use org-roam, and put a lot of things in dailies, so I have a temporal log of work that is easily searchable (using deft, or just rg). So much of the mental burden of where to put things is gone, they are in my journal now, and I can always tangle them into a file if I need to, and then I just start linking commits into org to continue to keep track.

Talk about tangling, I also have one big "system config" file that contains all of my rc files and other system configurations and scripts, with sensitive information encrypted transparently with org crypt. I just keep this all in my shared nextcloud Sync folder. I even have configuration and scripts for my homelab and personal server in other files!

Not only that, but I have started relying on org attach to keep a repository of miscellaneous files. Have some PDF or zipfile I dont want to forget about? I just attach it to the daily document and write some notes about it, and wherever I am I can get it. Even have started compressing old projects and "backing them up" into org.

I work through SICP these days in an org babel document, with liberal tangling and noweb, and now I have a journal to myself of my progress, constantly linking to other nodes as I gain more concepts.

I am not a professional computer person, so I don't work with other people. I understand that this works for me because of that.

mynameismon · on Feb 19, 2022

> I work through SICP these days in an org babel document

Would you mind sharing a bit more on your workflow? I am personally going through Crafting Interpreters myself, and I am struggling to organise my code and notes, since I am unable to get org-babel to work like in separate files, and build in one go.

Jtsummers · on Feb 19, 2022

Not GP, but I've done this before.

I don't have access to it at the moment, but I wrote a small shell script that invoked emacs (without running init.el/.emacs) and ran org tangle on a file. I incorporated that into a Makefile so it ran on every org file. I placed my org files in the same places (in the file system) as normal source files and had a 1-to-1 mapping of org files to <target language> files (C, Java, Lisp, Go, doesn't matter, done it all). 1-to-1 isn't necessary, I've also done one mega-file that tangled into many source files, including into subdirectories. After tangling, you can trigger any particular build system commands needed (like in rust, run `cargo build` or `cargo run`).

Another thing I've done for smaller things is something like:

  #+BEGIN_SOURCE language :tangle foo.language
  ...
  #+END_SOURCE
  #+BEGIN_SOURCE sh
    build command foo.language
    ./foo
  #+END_SOURCE

Run C-v-t (to tangle the source file(s)) and then navigate to that last block and use C-c C-c to execute it, which will execute the language specific build commands and then run the executable (adjust to particular circumstances). You can also have many of those shell blocks to run different things or in different ways (one to build a release version, another to build a debug version, another to run all the tests, etc.).

mynameismon · on Feb 19, 2022

Ah, that actually seems extremely interesting. Thanks for clearing the multifile bit of it!

kaushalmodi · on Feb 19, 2022

Emacs Org Babel is amazing! I also was practicing awk examples from The AWK Programming Book few years back in a similar fashion: https://scripter.co/notes/awk/.

And it's not just awk, there are Org Babel packages available for virtually all the languages!

- Nim (these are my most comprehensive set of notes): https://scripter.co/notes/nim/

- Tcl: https://scripter.co/notes/tcl/

- String formatting in Nim and Python: https://scripter.co/notes/string-fns-nim-vs-python/

- PlantUML: https://scripter.co/notes/plantuml/

In all the notes pages above, the result of the code blocks is seen directly in the Emacs buffer when I hit C-c C-c. Then I simply* export all those notes to Markdown and publish them using Hugo.

* Tangent: That's one of the main reasons why I went down the path of developing ox-hugo.

giraffe_lady · on Feb 20, 2022

do you have to install and configure each language into emacs or org mode? or is just having them in your system path enough?

I'll get into this if I don't have to twiddle with each one individually but I know emacs enough to suspect that's how it'll be.

kaushalmodi · on Feb 20, 2022

No, having the related executable in the PATH before you start emacs is enough.

Then you need to install the Org Babel packages that don't ship with Emacs or Org mode e.g. ob-tcl, ob-nim.

xyzwave · on Feb 19, 2022

Somewhat tangential, but the author calls out using `grep` in conjunction with `awk`. Anytime I find myself doing this, it usually turns out that I can just throw the pattern into AWK itself.

    grep 'foo' file.txt | awk '{ print $1 }'

Becomes:

    awk '/foo/ { print $1 }' file.txt

There may be times when `grep` is preferable, but this is ubiquitous enough that it's mentioned in the "Useless Use of Cat" [1] awards.

[1] https://porkmail.org/era/unix/award

aasasd · on Feb 19, 2022

Both the ‘useless use of cat’ and this nitpick miss that a unix-style command line is very much like a bunch of functions feeding results into each other, with each doing its own thing. This allows one to chain different functions in the same general way: namely, if I want to limit the input at first for debugging, I can do `cat stuff | head -10 | rest-of-commands`, or `grep stuff | head -10 | awk`—and I won't need to keep dragging the `<stuff` input around between commands, or to move the regex from awk to grep and back again or put the limit into the neatly written awk command.

xyzwave · on Feb 19, 2022

Agreed, this is one the primary principles of the Unix Philosophy [1].

I was curious how performance would be impacted, and for a large file `grep` may not be so useless.

    $ du -h /tmp/file.json  
    161M /tmp/file.json

    $ time awk -F: '/"id"/ { print $1 }' /tmp/file.json >/dev/null

    real 0m17.810s
    user 0m17.690s
    sys 0m0.083s

    $ time (grep '"id"' /tmp/file.json | awk -F: '{ print $1 }' >/dev/null)

    real 0m3.617s
    user 0m3.641s
    sys 0m0.037s

While pushing the filter into AWK is a bit easier on the eyes, it appears there is an incurred performance cost.

[1] http://www.catb.org/~esr/writings/taoup/html/ch01s06.html

cle · on Feb 19, 2022

Yeah grep will generally be faster than awk for finding patterns in files, since that's the only thing it does, and it can (and does) take advantage of those constraints to apply performance optimizations.

Personally I don't think either approach is "better". Sometimes it's easier to deal with some complex logic if it's all in a single awk program, rather than smeared across combinations of grep, awk, and other things. If that outweighs the perf drop in some specific situation, then I'll do it.

aasasd · on Feb 19, 2022

You might get an even better speedup by using `grep -F` or ripgrep.

dima55 · on Feb 19, 2022

Can you share the file.json? I'd like to understand this difference better.

xyzwave · on Feb 20, 2022

Unfortunately I picked something from my day job with a bunch of proprietary data.

It was essentially a large list of pretty-printed objects with about 20 attributes each. A few of which had multi-paragraph string values.

czx4f4bd · on Feb 19, 2022

This is one of those things that really irks me about the Unix shell.

You're totally right that commands like `cat *.txt | grep foo` look right and feel intuitive, but the problem is that it's not actually equivalent to `grep foo *.txt`. With a single input file, it's probably not an issue, but with multiple files the downstream command can't actually see what file it's reading from, so in this example grep can't report filenames with matches and you can't use flags like --include/--exclude based on filenames.

I keep wishing for a Unix-like shell and utilities where `cat *.txt | grep foo` actually works the same as `grep foo *.txt`. PowerShell has pulled this off to some extent, which is cool, but it doesn't quite feel right to me compared to Bash.

ajross · on Feb 19, 2022

I've made this point in the past, but I find it absolutely hilarious to see the number of kids these days who are rediscovering awk. Awk was dead; like dead-dead, totally useless legacy silliness, for decades. And the reason was that literally everything you could write in awk could be more easily and more powerfully expressed in perl. In a world where every system comes with perl and every admin knows perl, there's no room for awk.

But then... perl kinda died back. It's no longer default in many distros. Most new unix kids aren't learning it. Perl is the old weirdness.

And... in a world without perl, awk looks pretty cool I guess. But folks: perl is still there.

massysett · on Feb 19, 2022

Awk has an enormous advantage over Perl: the Awk manual is short, and you can conceivably learn Awk just by reading the man page. If I go back years later and read an Awk program, I’m more likely to understand it. Using Perl for many tasks would be like using Matlab to add up a column of numbers. You can, but there’s also something to be said for just using a pocket calculator.

jdougan · on Feb 20, 2022

Heck, back in the mid-80s we wrote a stripped down awk as part of a 2nd year CS course. Can't do that with Perl.

tyingq · on Feb 19, 2022

I love Perl, but for one liners, awk is often easier to remember than something like Perl's autosplit. Awk is often faster too, for large data files, mawk especially.

throwaway5486nv · on Feb 20, 2022

Say i want to print the Nth column of programs output how easy it is compared to awk?

ajross · on Feb 20, 2022

Heh, sure:

    perl -ne 'print((split())[1])'

While it's true that "{print $2}" is a little shorter, it's also quite limited. The perl syntax extends naturally to different delimeters (the first argument to split is a regex) or actions (maybe you don't want to print it and want to do some tiny string processing).

Essentially, you picked the One Single Task at which awk is best. And... it's only barely better, and only for the specific variant of the problem for which awk was designed.

Again, this argument was had and settled. No one in their right mind would have been caught writing an awk script in 1997. That doesn't mean awk is bad (it's software from the mid-70's!), it means it was superceded. And it's important for people interested in those sorts of problems to understand why it got replaced.

TheFreim · on Feb 19, 2022

Very interesting! I would like to learn awk and this could come in handy.

I love emacs. These days I increasingly use it for more and more tasks from reading pdfs to writing notes to email. Despite this fact, something about the "doing X with emacs" is starting to bother me for some reason. It's hard to pin down why I feel this way, does anyone else have a similar experience?

jasperry · on Feb 19, 2022

I think so. I am a big emacser but I sometimes feel that trying to do everything in emacs is less elegant than having separate special-purpose tools, and leads to spending more time on configuration to integrate all the packages and manage all my frames.

You could say that I want emacs-like behavior everywhere, but not everything inside of emacs.

TheFreim · on Feb 19, 2022

> You could say that I want emacs-like behavior everywhere, but not everything inside of emacs.

I think you really hit the nail on the head with this statement. It's why I hope projects like NYXT Browser continue to improve.

bear8642 · on Feb 19, 2022

Feel this is part of why I enjoy Acme more than emacs as everything just connects

_ix · on Feb 19, 2022

I love org-babel, but am I alone in thinking it’s probably best-used for well-understood workflows? Prototyping something with org-babel usually forces me to kill emacs when it hangs for any number of reasons–I imagine it’s choking while trying to reformat as a table. I’ve experimented with ob-async, but it seems unreliable so far. I wish emacs really were more like an os and could multitask a bit better.

nanomonkey · on Feb 19, 2022

Adding `:results raw` will remove the table formatting step. Otherwise, I agree, multitasking would be a boon.

BeetleB · on Feb 19, 2022

How big is your table?

I use org babel all the time and haven't had any reliability issues.

psibi · on Feb 19, 2022

I learnt it via the org babel way too and published my notes when I finished the chapters of the book: https://psibi.in/awk/

The whole interactive experience of evaluating awk script and tinkering with it in a single place, greatly helped while learning it.

qiskit · on Feb 19, 2022

You could also learn awk on bash/command line and save your awk-related command history. Keep your test data, awk script and awk command history in one directory so that you can review it later on if needed. Just keep it simple. Goes for bash scripting, sed, etc as well.