Hacker News new | past | comments | ask | show | jobs | submit login
Bash one-liner to produce a list of HEX color codes that read like English words (gist.github.com)
199 points by ailef on Oct 1, 2022 | hide | past | favorite | 90 comments



I tried this a few years ago; http://canonical.org/~kragen/sw/dev3/colors.html has them as foreground colors and http://canonical.org/~kragen/sw/dev3/colors.2.html has them as background colors. I tested 3-letter words as well as 6-letter words, and used 1 as "l" as well as "I", but I didn't try aghasemi's very productive suggestion of using 5 as S. I don't remember if it it didn't occur to me or if I tried it and didn't like the results.

Some of them are pretty #bad (#011 doesn't really look much like "oil") and some, though they read quite well, correspond to awful colors; you might even say, #faeca1 colors. Still, I've made my #bed, #0dd as it may be; now I must #11e in it. I think I've #fed you enough #babb1e for today.


The gist is rather a pipeline of Unix commands with no bash necessarily involved. Here it is in shellcheck-compliant 100% bash:

    #!/usr/bin/env bash
    shopt -s nocasematch
    while read -r word; do
        if [[ $word =~ ^[abcdefoi]{6,6}$ ]]; then
            word=${word//o/0}
            word=${word//i/1}
            word=${word^^}
            printf '#%s\n' "$word"
        fi
    done < /usr/share/dict/words

This could be collapsed to one line with semicolons. On the macOS 12.6 dictionary I get 59 words.

Edit: and in sed which someone just asked me for elsewhere:

    sed -n -e '
    /^[abcdefoi]\{6,6\}$/I {
    s/o/0/g;
    s/i/1/g;
    s/^/#/;
    y/abcdef/ABCDEF/;
    p;}' < /usr/share/dict/words


     sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e 's/^[A-F01]\{6\}$/#&/p' /usr/share/dict/words


I’d golf with you but I think you got a hole in one there. I didn’t spend any time thinking about how to make the sed more compact. I sorta just translated what I’d already written in bash.


Thanks for this. I'd probably call the original the GNU coreutils version. The linked github also has a sed-only version in the comments. It's instructive to see the different versions.


> I'd probably call the original the GNU coreutils version.

Why? The only GNUish bit is the grep -P option, which is unnecessary (-E will do as well).


I would have considered tr to be part of gnu coreutils, awk, not necessarily but the default on a mac is gawk I believe


tr predates GNU by about a decade.


I just added a sed version as well. I'll have to click through and see how closely it resembles what's in the gist.

bash is actually pretty powerful if you don't mind its baroque syntax. Writing it in POSIX would be a bit more challenging. You could use a case statement for the pattern matching, but I'm not sure about the substitution.


Never mind the colors.

This snippet demonstrates how a number of small tools, each doing its narrow job, strung together via the most trivial interface, produces a non-trivial result.

This composability is still unreachable to the vast majority of GUI tools.


The non-trivial part here is actually the source data (the dict file.) It is also its pitfall - after adding 5 for S you should see a lithany of plurals. Most dict files (for English anyway) however seem to omit plural nouns. I guess the logic is that in English most plurals are regular, and the naive algorithm for deriving them from the singular forms (correctly most of the time) is quite trivial.


SaaS companies hate this one weird trick!


While it is a neat trick as one liner, I would recommend against doing anything like this in any software that requires maintenance. The code is hard, or impossible to follow, no comments. Brittle and only few people can understand what it really does. Better option would be 10 lines of Python or JavaScript with some comments.


I thought it was trivial to understand, though the comment above it helps a lot, and it's maybe an unfair advantage that I'd done the same thing in pretty much the same way four years ago. It probably depends on your background; I wouldn't write it that way for people who didn't know shell, just like I wouldn't write this comment in English for people who speak only Spanish.

I'm not convinced that it's easier to understand in Python (even though I simplified it a bit, in part because one piece of the Python 3 braindamage was moving string.maketrans to bytes):

    import re


    def main(words):
        for word in words:
            word = word.strip().upper()
            if re.compile(r'[A-FOI]{6}$').match(word):
                print('#' + word.replace('I', '1').replace('O', '0'))

    if __name__ == '__main__':
        main(open('/usr/share/dict/words'))
I think the shell version is clearly better for interactive improvisation, though.


I prefer search with an explicit '^' in the pattern to using match. For a throw-away script I'd probably do this:

    import re
    is_hex_like = re.compile(r"^[a-foi]{6}$", re.I).search
    for word in filter(is_hex_like, open("/usr/share/dict/words")):
        hexword = word.upper().replace("O", "0").replace("I", "1").rstrip()
        print(f"#{hexword}")


findall and multiline mode makes it even easier, at the cost of loading whole file into memory though, for that reason your alternaive is probably better

    import re
    wordlist = open("/usr/share/dict/words").read()
    for word in re.findall(r"^[a-foi]{6}$", wordlist, re.IGNORECASE | re.MULTILINE):
        hexword = word.upper().replace("O", "0").replace("I", "1")
        print(f"#{hexword}")


That's nicer than my version! I'm curious why you prefer search(), though.


1. I don't have to remember which implicitly anchors to the start of the string and which doesn't. 2. I prefer the explicitness of '^' (maybe that's just another way of stating (1). 3. I can use re.M to modify '^' to match at the start of each line on multiline strings, whereas match will still keep searching from the front. 4. The asymmetry of anchoring the front but not the end is weird. Python now has fullmatch, but ugh, just use the pattern for that if you need it. 5. Off the top of my head, I can't think of another language that has a regex function that implicitly anchors the front.


Hmm, I see. Interesting! I think of regexps as state machines, so I think of the implicit loop to find a starting position as extra complexity, which can give rise to for example performance problems, though it's true that in many languages you can't avoid it.


Comments can be added. Understanding it requires learning the tools. Just like understanding python or javascript requires learning python or javascript. It's not impossible to follow.


I understood it instantly on first read. Probably depends on how much shell you write.


Ah yes, the Unix Way


It's missing #DADB0D


I look forward to your improved version that tests against the Cartesian product of /usr/dict/words with itself plus the empty string and maybe some slang words like "bod". I suggest you limit to shortish words before the Cartesian product rather than after.



Testing against a list of all Wikipedia article titles is indeed also an avenue worth exploring, and I hope you explore it.


I installed the American English large dictionary on Ubuntu. It has `bod`.


Nice! I'm just using the 102'401-entry version.


Wish I could say the same.


Is HEX another of these words which gets erroneously capitalised, like SCRUM or GAP analysis?


I've noticed that for years in embedded (where we use "Intel HEX" formatted files) but I ascribed it to a field full of eccentric loners doing idiosyncratic things, or some kind of DOS 8.3 brain damage.


Or ELO score?


Does anyone have a link to a guide on how to write Python or node or rust programs that behave well with bash? Ie. Streaming inputs and outputs and other things I probably don’t know about?


It’s pretty easy. You have three basic streams:

1. Stdin - just iterate through sys.stdin

2. Stdout - regular printing will go there

3. Stderr - print errors here eg with print(…, file=sys.stderr)

And then beyond that as long as your script gets invoked by the interpreter (Ie #!/usr/bin/env python) everything will “just work”.


Don’t you also have to keep in mind how often you flush outputs/how you buffer? Encoding? Handle EOF correctly?

Not saying it’s hard but also it’s not 100% covered by what you said.


Those are advanced topics and you can look them up if you need them.

Generally, Python does the right thing by default for scripting use: line buffered, system encoding, EOF handled naturally by the iterator protocol.


And preferably use fileinput for the stdin so that you can name files on the command line as well


And avoid seek. Pipes are not random access. I once tried to use a python library to convert a file from stdin but it failed on a f.seek(0) the library added 'just in case' in the beginning.


My book Data Science at the Command Line has a chapter about this that scratches the surface and lists some resources in case you want to dive deeper [1]. I can also recommend checking out packages such as Rich [2] and Click [3], if only to get an idea of the possibilities when it comes to creating command-line tools with Python.

[1] https://datascienceatthecommandline.com/2e/chapter-4-creatin...

[2] https://github.com/Textualize/rich

[3] https://click.palletsprojects.com/en/8.1.x/


This is oddly something that some of the earliest Node interfaces do quite well. (I say “oddly” because Node was mostly promoted early on for network/server use cases.) It’s generally not idiomatic in these days of async/await and Web Streams, but streaming IO was a core async primitive from very early on. 0.1.90 for child processes, unspecified for the main process object so possibly from the first release. Granted the interfaces really show their age in terms of incidental complexity, they’re far from being as simple as their shell equivalents. But as far as behaving well, streaming is solid and there’s a wealth of compatibility affordances depending on how portable your script needs to be.


For Python using fileinput module goes long way: https://docs.python.org/3/library/fileinput.html


With argparse.FileType, similar behavior integrates well with argparser https://docs.python.org/3/library/argparse.html#argparse.Fil...


Reminds me of debugging pointer values in C with 0xDEADBEEF.


I appreciate the presence of #C0FFEE.

Can't do computing without that!! :-)


That color doesn’t look healthy though. ;)


Similarly, a list of hex words https://jott.live/code/hex_words


Fun idea. Perhaps could stretch a little like we did in calculators and add 5 for S, or even 7 for T, but that would likely be a bit less readable.


I added a comment for 5 vs S. 7/T looks like it's a bit too much :D


You could just do full 1337 speek.


pager code, probably better. "143" = I love you; but 177427*711773 = what time. I don't miss those days. I never had a pager, and i managed to convince all my friends that they shouldn't, either, by pager bombing them. Pagers are still in use, and they're plaintext over the air so if you live near a place that uses pagers (hospitals still use them, for instance), you can get all the messages in real time. It's the frequency. It's in VHF (iirc) so it goes places microwaves cannot; it's also low bandwidth, so the small spectrum carved out for it is usually enough for hundreds of pagers in the area.

And since there's no real place to mention this elsewhere, there's a HTML color bot on fediverse (botsin.space) that periodically posts two colors, that work as compliments as foreground and background, and vice versa. I haven't seen it in a while, but our little instance has gotten popular so the feed rate is up near a few hundred posts an hour to sift through.


Little town I frequently drive through has a population of 1337.

I always have a little giggle.


what town and country?


I like my pseudo-anonymity here.

It's in the US. Here's the census data to discover many occurrences of "1337"

https://www.census.gov/data/tables/time-series/demo/popest/2...

FWIW the town I'm talking about has a different population listed there, a little bit short. The road sign still says 1337, though, as of Thursday.


come to think of it, doing a separate list of toLower l -> 1 isn't a bad idea either...


It makes me happy that #ACAC1A is about the right colour for the flowers of the sweet acacia tree (a pale yellow).


I know this is only looking at single words, so would miss this, but I always like to work ABAD1DEA into PoC work.


I like this! I usually try to pick a word/set of words that relates to the subject matter I’m testing, or something off the top of my head when that fails. But ABAD1DEA is a great default for exploratory work.

This is also an 8 character string, which I had wrongly inferred from usage in existing code to be restricted to certain APIs, but I looked it up and it’s evidently part of CSS Color Module Level 4 and has wide browser support. The one-liner could trivially be expanded to support 8-character codes. Not sure how trivial multiple words would be, my gut says “reasonably so but won’t feel quite so reasonable on one line”. Alas I’m on mobile so I’m not gonna try it right now.


Just as RRGGBB has a three colour shorthand, you can use for characters too: RGBA as a shorthand for RRGGBBAA.


Not sure why this is being called "Bash" one-liner. It will work with many shells. It will run noticeably faster in Dash, for example. Test it yourself. Linux chooses Dash for non-interactive use, like this one-line script, because it is faster than Bash.


Some examples of where one finds Dash (NetBSD-derived Almquist shell, or "ash") in Linux

   The git.kernel.org repository
   Slackware
   Debian 
   Unbuntu
   Gentoo
   Arch initramfs
   Alpine 
   Tiny Core 
   OpenWRT
   Any other distrib that uses Busybox
   Android
What the OP fails to mention is that this shell one-liner (cf. "Bash one-liner"), as written, requires GNU grep, thanks to "-P".

BusyBox grep does not have a "-P" option.

In the case of Android, Google uses NetBSD userland programs, e.g., grep, which also does not include PCRE, i.e., "-P".

https://coral.googlesource.com/android-core/+/3458bb6ce1d3e7...

https://git.kernel.org/pub/scm/utils/dash/dash.git/

   curl -O https://mirror.rackspace.com/archlinux/iso/2022.10.01/arch/boot/x86_64/initramfs-linux.img
   xz -dc < initramfs-linux.img|cpio -t|grep -m1 usr/bin/ash


It's written with `-P` but doesn't actually need it. Standard `-E` works just fine instead.


How many "professional" programmers even know the difference between BRE, ERE and PCRE.

Perhaps this is why use of regex is so controversial amongst a majority of "professional" programmers. They are trying to use PCRE for every pattern matching task, i.e, even ones where it is not necessary, whether it is within their programing language or with command-line utilities. This "Bash one-liner" is a simple example.

I have reviewed a number of books written about regular expressions and for the most part^1 they focus only on regex as implemented in popular programming languages. That almost invariably is PCRE or some form of PCRE-like pattern matching. There is little distinction, let alone acknowledgment, between PCRE/PCRE-like patterns and anything simpler.

Not being a "professional" programmer, I use regex everyday but I never (intentionally) use PCRE.^2 Too complicated for my tastes, not to mention slow if using backtracking.

1. I recall one older book that did include an incomplete table attempting to show which type of regex was used by various UNIX utilities in addition to what regex was used by popular programming languages of the day.

2. For programs that optionally link to a PCRE library, I re-compile without them without it.


> Linux chooses Dash for non-interactive use

That entirely depends on the Linux distro.


I don't like using multiple commands.

    mawk 'BEGIN{b = "[abcdefois]"; l = "[a-z]"; W = "^" b l l l l l "$"}; $0 ~ W {print "#" toupper($0);}' /usr/share/dict/words


I came up with:

  gawk 'BEGIN {IGNORECASE=1} ((length($1) == 6) && /^[a-fois]+$/) {gsub(/o/,0);gsub(/i/,1);gsub(/s/,5); print toupper("#"$1)}' /usr/share/dict/words
(caveat: it does not filter out duplicates)


You can also do it entirely in sed:

    sed -E -e '/^[a-fio]{6}$/!d; y/abcdefioIO/ABCDEF1010/; s/^/#/' /usr/share/dict/words


This produces 35 items. The grep version gives 93


Yeah, I failed to make the pattern case insensitive.

Here's a fixed version that also handles S/5:

    sed -E -e '/^[A-FIOSa-fios]{6}$/!d; y/abcdefiosIOS/ABCDEF105105/; s/^/#/' /usr/share/dict/words


you also aren't going to get valid color codes


I wanted a t-shirt that is the color #FAB; and says #FAB; on it, thought it'd be a fun one for digital artists, then I found out how hard it would be to get t-shirt that matches it just right.


Fun fact: Every Java .class file starts with the magic bytes C0FEBABE


CAFEBABE

"...We used to go to lunch at a place called St Michael’s Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line, it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after “CAFE” (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn’t seem terribly important or destined to go anywhere but the trash can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD – it was eventually replaced by RMI...."

- James Gosling


I had the distinct pleasure of discovering CAFEBABE myself, in high school (not sure what direction this is dating myself in but I'll risk it), when I went on a tear of opening odd things in a hex editor.

Now I will never be able to see without thinking of this story: https://aphyr.com/posts/341-hexing-the-technical-interview


I've been using that as my own alternative to DEADBEEF for years, I had no idea it was part of the official Java spec. Maybe it got lodged in my brain subconsciously at some point.


It's CAFEBABE



Interesting one liner but would like to see the colors it generates


If your terminal does 24-bit colour, and your shell is bash or ksh or zsh or close,

    sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e '/^[A-F01]\{6\}$/p' /usr/share/dict/words | while read c; do printf '\033[38;2;%d;%d;%dm#%s\033[0m\n' $((0x${c:0:2})) $((0x${c:2:2})) $((0x${c:4})) $c; done


View colors here

https://codepen.io/srcreigh/pen/QWrrgdx

Code thanks to gabrielsroka on the Github thread


oh wow, #seabed generated a beautiful blue. what a truly happy accident!


Acacia is green, and fesses (buttocks in French) is pink. Coocoo is the only red in a surrounding of violets, and sobbed is a transparent-y blue like a tear :)


Access is green, acidic is red, and my favourite, cabbie is a nice yellow!



Not long ago I saw a link here to site with the words and the colors...




What about 7 for T and also 3 for E?


E is a legit hex character:

0123456789ABCDEF

isn't it?

The 3 for E in 1337 speak was on numerical calculators that didn't display letters.


Using 3 you can get more colors with human readable names, and maybe pick the canonical color for any given word based on some criteria of interestingness.


No 7 for a T?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: