Bash one-liner to produce a list of HEX color codes that read like English words

kragen · on Oct 1, 2022

I tried this a few years ago; http://canonical.org/~kragen/sw/dev3/colors.html has them as foreground colors and http://canonical.org/~kragen/sw/dev3/colors.2.html has them as background colors. I tested 3-letter words as well as 6-letter words, and used 1 as "l" as well as "I", but I didn't try aghasemi's very productive suggestion of using 5 as S. I don't remember if it it didn't occur to me or if I tried it and didn't like the results.

Some of them are pretty #bad (#011 doesn't really look much like "oil") and some, though they read quite well, correspond to awful colors; you might even say, #faeca1 colors. Still, I've made my #bed, #0dd as it may be; now I must #11e in it. I think I've #fed you enough #babb1e for today.

js2 · on Oct 1, 2022

The gist is rather a pipeline of Unix commands with no bash necessarily involved. Here it is in shellcheck-compliant 100% bash:

    #!/usr/bin/env bash
    shopt -s nocasematch
    while read -r word; do
        if [[ $word =~ ^[abcdefoi]{6,6}$ ]]; then
            word=${word//o/0}
            word=${word//i/1}
            word=${word^^}
            printf '#%s\n' "$word"
        fi
    done < /usr/share/dict/words

This could be collapsed to one line with semicolons. On the macOS 12.6 dictionary I get 59 words.

Edit: and in sed which someone just asked me for elsewhere:

    sed -n -e '
    /^[abcdefoi]\{6,6\}$/I {
    s/o/0/g;
    s/i/1/g;
    s/^/#/;
    y/abcdef/ABCDEF/;
    p;}' < /usr/share/dict/words

kps · on Oct 2, 2022

     sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e 's/^[A-F01]\{6\}$/#&/p' /usr/share/dict/words

js2 · on Oct 2, 2022

I’d golf with you but I think you got a hole in one there. I didn’t spend any time thinking about how to make the sed more compact. I sorta just translated what I’d already written in bash.

version_five · on Oct 1, 2022

Thanks for this. I'd probably call the original the GNU coreutils version. The linked github also has a sed-only version in the comments. It's instructive to see the different versions.

kps · on Oct 2, 2022

> I'd probably call the original the GNU coreutils version.

Why? The only GNUish bit is the grep -P option, which is unnecessary (-E will do as well).

version_five · on Oct 2, 2022

I would have considered tr to be part of gnu coreutils, awk, not necessarily but the default on a mac is gawk I believe

kragen · on Oct 2, 2022

tr predates GNU by about a decade.

js2 · on Oct 2, 2022

I just added a sed version as well. I'll have to click through and see how closely it resembles what's in the gist.

bash is actually pretty powerful if you don't mind its baroque syntax. Writing it in POSIX would be a bit more challenging. You could use a case statement for the pattern matching, but I'm not sure about the substitution.

nine_k · on Oct 1, 2022

Never mind the colors.

This snippet demonstrates how a number of small tools, each doing its narrow job, strung together via the most trivial interface, produces a non-trivial result.

This composability is still unreachable to the vast majority of GUI tools.

vesinisa · on Oct 1, 2022

The non-trivial part here is actually the source data (the dict file.) It is also its pitfall - after adding 5 for S you should see a lithany of plurals. Most dict files (for English anyway) however seem to omit plural nouns. I guess the logic is that in English most plurals are regular, and the naive algorithm for deriving them from the singular forms (correctly most of the time) is quite trivial.

throwing_away · on Oct 1, 2022

SaaS companies hate this one weird trick!

miohtama · on Oct 1, 2022

While it is a neat trick as one liner, I would recommend against doing anything like this in any software that requires maintenance. The code is hard, or impossible to follow, no comments. Brittle and only few people can understand what it really does. Better option would be 10 lines of Python or JavaScript with some comments.

kragen · on Oct 1, 2022

I thought it was trivial to understand, though the comment above it helps a lot, and it's maybe an unfair advantage that I'd done the same thing in pretty much the same way four years ago. It probably depends on your background; I wouldn't write it that way for people who didn't know shell, just like I wouldn't write this comment in English for people who speak only Spanish.

I'm not convinced that it's easier to understand in Python (even though I simplified it a bit, in part because one piece of the Python 3 braindamage was moving string.maketrans to bytes):

    import re


    def main(words):
        for word in words:
            word = word.strip().upper()
            if re.compile(r'[A-FOI]{6}$').match(word):
                print('#' + word.replace('I', '1').replace('O', '0'))

    if __name__ == '__main__':
        main(open('/usr/share/dict/words'))

I think the shell version is clearly better for interactive improvisation, though.

js2 · on Oct 2, 2022

I prefer search with an explicit '^' in the pattern to using match. For a throw-away script I'd probably do this:

    import re
    is_hex_like = re.compile(r"^[a-foi]{6}$", re.I).search
    for word in filter(is_hex_like, open("/usr/share/dict/words")):
        hexword = word.upper().replace("O", "0").replace("I", "1").rstrip()
        print(f"#{hexword}")

Too · on Oct 2, 2022

findall and multiline mode makes it even easier, at the cost of loading whole file into memory though, for that reason your alternaive is probably better

    import re
    wordlist = open("/usr/share/dict/words").read()
    for word in re.findall(r"^[a-foi]{6}$", wordlist, re.IGNORECASE | re.MULTILINE):
        hexword = word.upper().replace("O", "0").replace("I", "1")
        print(f"#{hexword}")

kragen · on Oct 2, 2022

That's nicer than my version! I'm curious why you prefer search(), though.

js2 · on Oct 2, 2022

1. I don't have to remember which implicitly anchors to the start of the string and which doesn't. 2. I prefer the explicitness of '^' (maybe that's just another way of stating (1). 3. I can use re.M to modify '^' to match at the start of each line on multiline strings, whereas match will still keep searching from the front. 4. The asymmetry of anchoring the front but not the end is weird. Python now has fullmatch, but ugh, just use the pattern for that if you need it. 5. Off the top of my head, I can't think of another language that has a regex function that implicitly anchors the front.

kragen · on Oct 3, 2022

Hmm, I see. Interesting! I think of regexps as state machines, so I think of the implicit loop to find a starting position as extra complexity, which can give rise to for example performance problems, though it's true that in many languages you can't avoid it.

rascul · on Oct 1, 2022

Comments can be added. Understanding it requires learning the tools. Just like understanding python or javascript requires learning python or javascript. It's not impossible to follow.

lrvick · on Oct 1, 2022

I understood it instantly on first read. Probably depends on how much shell you write.

kupopuffs · on Oct 1, 2022

Ah yes, the Unix Way

pwpwp · on Oct 1, 2022

It's missing #DADB0D

kragen · on Oct 1, 2022

I look forward to your improved version that tests against the Cartesian product of /usr/dict/words with itself plus the empty string and maybe some slang words like "bod". I suggest you limit to shortish words before the Cartesian product rather than after.

mellosouls · on Oct 1, 2022

https://en.wikipedia.org/wiki/Dad_bod

kragen · on Oct 1, 2022

Testing against a list of all Wikipedia article titles is indeed also an avenue worth exploring, and I hope you explore it.

gabrielsroka · on Oct 1, 2022

I installed the American English large dictionary on Ubuntu. It has `bod`.

kragen · on Oct 1, 2022

Nice! I'm just using the 102'401-entry version.

kgwxd · on Oct 2, 2022

Wish I could say the same.

b800h · on Oct 1, 2022

Is HEX another of these words which gets erroneously capitalised, like SCRUM or GAP analysis?

markrages · on Oct 2, 2022

I've noticed that for years in embedded (where we use "Intel HEX" formatted files) but I ascribed it to a field full of eccentric loners doing idiosyncratic things, or some kind of DOS 8.3 brain damage.

teo_zero · on Oct 2, 2022

Or ELO score?

Waterluvian · on Oct 1, 2022

Does anyone have a link to a guide on how to write Python or node or rust programs that behave well with bash? Ie. Streaming inputs and outputs and other things I probably don’t know about?

KMnO4 · on Oct 1, 2022

It’s pretty easy. You have three basic streams:

1. Stdin - just iterate through sys.stdin

2. Stdout - regular printing will go there

3. Stderr - print errors here eg with print(…, file=sys.stderr)

And then beyond that as long as your script gets invoked by the interpreter (Ie #!/usr/bin/env python) everything will “just work”.

IgorPartola · on Oct 2, 2022

Don’t you also have to keep in mind how often you flush outputs/how you buffer? Encoding? Handle EOF correctly?

Not saying it’s hard but also it’s not 100% covered by what you said.

markrages · on Oct 2, 2022

Those are advanced topics and you can look them up if you need them.

Generally, Python does the right thing by default for scripting use: line buffered, system encoding, EOF handled naturally by the iterator protocol.

gnubison · on Oct 2, 2022

And preferably use fileinput for the stdin so that you can name files on the command line as well

Calzifer · on Oct 2, 2022

And avoid seek. Pipes are not random access. I once tried to use a python library to convert a file from stdin but it failed on a f.seek(0) the library added 'just in case' in the beginning.

jeroenjanssens · on Oct 2, 2022

My book Data Science at the Command Line has a chapter about this that scratches the surface and lists some resources in case you want to dive deeper [1]. I can also recommend checking out packages such as Rich [2] and Click [3], if only to get an idea of the possibilities when it comes to creating command-line tools with Python.

[1] https://datascienceatthecommandline.com/2e/chapter-4-creatin...

[2] https://github.com/Textualize/rich

[3] https://click.palletsprojects.com/en/8.1.x/

eyelidlessness · on Oct 1, 2022

This is oddly something that some of the earliest Node interfaces do quite well. (I say “oddly” because Node was mostly promoted early on for network/server use cases.) It’s generally not idiomatic in these days of async/await and Web Streams, but streaming IO was a core async primitive from very early on. 0.1.90 for child processes, unspecified for the main process object so possibly from the first release. Granted the interfaces really show their age in terms of incidental complexity, they’re far from being as simple as their shell equivalents. But as far as behaving well, streaming is solid and there’s a wealth of compatibility affordances depending on how portable your script needs to be.

zokier · on Oct 1, 2022

For Python using fileinput module goes long way: https://docs.python.org/3/library/fileinput.html

Too · on Oct 2, 2022

With argparse.FileType, similar behavior integrates well with argparser https://docs.python.org/3/library/argparse.html#argparse.Fil...

netule · on Oct 1, 2022

Reminds me of debugging pointer values in C with 0xDEADBEEF.

dwheeler · on Oct 1, 2022

I appreciate the presence of #C0FFEE.

Can't do computing without that!! :-)

layer8 · on Oct 1, 2022

That color doesn’t look healthy though. ;)

brrrrrm · on Oct 1, 2022

Similarly, a list of hex words https://jott.live/code/hex_words

silisili · on Oct 1, 2022

Fun idea. Perhaps could stretch a little like we did in calculators and add 5 for S, or even 7 for T, but that would likely be a bit less readable.

ghasemi · on Oct 1, 2022

I added a comment for 5 vs S. 7/T looks like it's a bit too much :D

bawolff · on Oct 1, 2022

You could just do full 1337 speek.

genewitch · on Oct 2, 2022

pager code, probably better. "143" = I love you; but 177427*711773 = what time. I don't miss those days. I never had a pager, and i managed to convince all my friends that they shouldn't, either, by pager bombing them. Pagers are still in use, and they're plaintext over the air so if you live near a place that uses pagers (hospitals still use them, for instance), you can get all the messages in real time. It's the frequency. It's in VHF (iirc) so it goes places microwaves cannot; it's also low bandwidth, so the small spectrum carved out for it is usually enough for hundreds of pagers in the area.

And since there's no real place to mention this elsewhere, there's a HTML color bot on fediverse (botsin.space) that periodically posts two colors, that work as compliments as foreground and background, and vice versa. I haven't seen it in a while, but our little instance has gotten popular so the feed rate is up near a few hundred posts an hour to sift through.

mod · on Oct 1, 2022

Little town I frequently drive through has a population of 1337.

I always have a little giggle.

hoyd · on Oct 2, 2022

what town and country?

mod · on Oct 2, 2022

I like my pseudo-anonymity here.

It's in the US. Here's the census data to discover many occurrences of "1337"

https://www.census.gov/data/tables/time-series/demo/popest/2...

FWIW the town I'm talking about has a different population listed there, a little bit short. The road sign still says 1337, though, as of Thursday.

silisili · on Oct 1, 2022

come to think of it, doing a separate list of toLower l -> 1 isn't a bad idea either...

Yenrabbit · on Oct 2, 2022

It makes me happy that #ACAC1A is about the right colour for the flowers of the sweet acacia tree (a pale yellow).

dspillett · on Oct 1, 2022

I know this is only looking at single words, so would miss this, but I always like to work ABAD1DEA into PoC work.

eyelidlessness · on Oct 1, 2022

I like this! I usually try to pick a word/set of words that relates to the subject matter I’m testing, or something off the top of my head when that fails. But ABAD1DEA is a great default for exploratory work.

This is also an 8 character string, which I had wrongly inferred from usage in existing code to be restricted to certain APIs, but I looked it up and it’s evidently part of CSS Color Module Level 4 and has wide browser support. The one-liner could trivially be expanded to support 8-character codes. Not sure how trivial multiple words would be, my gut says “reasonably so but won’t feel quite so reasonable on one line”. Alas I’m on mobile so I’m not gonna try it right now.

dspillett · on Oct 2, 2022

Just as RRGGBB has a three colour shorthand, you can use for characters too: RGBA as a shorthand for RRGGBBAA.

1vuio0pswjnm7 · on Oct 2, 2022

Not sure why this is being called "Bash" one-liner. It will work with many shells. It will run noticeably faster in Dash, for example. Test it yourself. Linux chooses Dash for non-interactive use, like this one-line script, because it is faster than Bash.

1vuio0pswjnm7 · on Oct 2, 2022

Some examples of where one finds Dash (NetBSD-derived Almquist shell, or "ash") in Linux

   The git.kernel.org repository
   Slackware
   Debian 
   Unbuntu
   Gentoo
   Arch initramfs
   Alpine 
   Tiny Core 
   OpenWRT
   Any other distrib that uses Busybox
   Android

What the OP fails to mention is that this shell one-liner (cf. "Bash one-liner"), as written, requires GNU grep, thanks to "-P".

BusyBox grep does not have a "-P" option.

In the case of Android, Google uses NetBSD userland programs, e.g., grep, which also does not include PCRE, i.e., "-P".

https://coral.googlesource.com/android-core/+/3458bb6ce1d3e7...

https://git.kernel.org/pub/scm/utils/dash/dash.git/

   curl -O https://mirror.rackspace.com/archlinux/iso/2022.10.01/arch/boot/x86_64/initramfs-linux.img
   xz -dc < initramfs-linux.img|cpio -t|grep -m1 usr/bin/ash

kps · on Oct 2, 2022

It's written with `-P` but doesn't actually need it. Standard `-E` works just fine instead.

1vuio0pswjnm7 · on Oct 2, 2022

How many "professional" programmers even know the difference between BRE, ERE and PCRE.

Perhaps this is why use of regex is so controversial amongst a majority of "professional" programmers. They are trying to use PCRE for every pattern matching task, i.e, even ones where it is not necessary, whether it is within their programing language or with command-line utilities. This "Bash one-liner" is a simple example.

I have reviewed a number of books written about regular expressions and for the most part^1 they focus only on regex as implemented in popular programming languages. That almost invariably is PCRE or some form of PCRE-like pattern matching. There is little distinction, let alone acknowledgment, between PCRE/PCRE-like patterns and anything simpler.

Not being a "professional" programmer, I use regex everyday but I never (intentionally) use PCRE.^2 Too complicated for my tastes, not to mention slow if using backtracking.

1. I recall one older book that did include an incomplete table attempting to show which type of regex was used by various UNIX utilities in addition to what regex was used by popular programming languages of the day.

2. For programs that optionally link to a PCRE library, I re-compile without them without it.

LambdaComplex · on Oct 2, 2022

> Linux chooses Dash for non-interactive use

That entirely depends on the Linux distro.

ratsmack · on Oct 1, 2022

I don't like using multiple commands.

    mawk 'BEGIN{b = "[abcdefois]"; l = "[a-z]"; W = "^" b l l l l l "$"}; $0 ~ W {print "#" toupper($0);}' /usr/share/dict/words

kbr2000 · on Oct 1, 2022

I came up with:

  gawk 'BEGIN {IGNORECASE=1} ((length($1) == 6) && /^[a-fois]+$/) {gsub(/o/,0);gsub(/i/,1);gsub(/s/,5); print toupper("#"$1)}' /usr/share/dict/words

(caveat: it does not filter out duplicates)

adrianmonk · on Oct 1, 2022

You can also do it entirely in sed:

    sed -E -e '/^[a-fio]{6}$/!d; y/abcdefioIO/ABCDEF1010/; s/^/#/' /usr/share/dict/words

xertopertha · on Oct 2, 2022

This produces 35 items. The grep version gives 93

adrianmonk · on Oct 2, 2022

Yeah, I failed to make the pattern case insensitive.

Here's a fixed version that also handles S/5:

    sed -E -e '/^[A-FIOSa-fios]{6}$/!d; y/abcdefiosIOS/ABCDEF105105/; s/^/#/' /usr/share/dict/words

Keyframe · on Oct 1, 2022

you also aren't going to get valid color codes

kgwxd · on Oct 2, 2022

I wanted a t-shirt that is the color #FAB; and says #FAB; on it, thought it'd be a fun one for digital artists, then I found out how hard it would be to get t-shirt that matches it just right.

teaearlgraycold · on Oct 1, 2022

Fun fact: Every Java .class file starts with the magic bytes C0FEBABE

belter · on Oct 1, 2022

CAFEBABE

"...We used to go to lunch at a place called St Michael’s Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line, it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after “CAFE” (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn’t seem terribly important or destined to go anywhere but the trash can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD – it was eventually replaced by RMI...."

- James Gosling

jrumbut · on Oct 1, 2022

I had the distinct pleasure of discovering CAFEBABE myself, in high school (not sure what direction this is dating myself in but I'll risk it), when I went on a tear of opening odd things in a hex editor.

Now I will never be able to see without thinking of this story: https://aphyr.com/posts/341-hexing-the-technical-interview

TillE · on Oct 1, 2022

I've been using that as my own alternative to DEADBEEF for years, I had no idea it was part of the official Java spec. Maybe it got lodged in my brain subconsciously at some point.

tragomaskhalos · on Oct 1, 2022

It's CAFEBABE

cantSpellSober · on Oct 2, 2022

Similar: https://nedbatchelder.com/text/hexwords.html

nick0garvey · on Oct 1, 2022

Interesting one liner but would like to see the colors it generates

kps · on Oct 2, 2022

If your terminal does 24-bit colour, and your shell is bash or ksh or zsh or close,

    sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e '/^[A-F01]\{6\}$/p' /usr/share/dict/words | while read c; do printf '\033[38;2;%d;%d;%dm#%s\033[0m\n' $((0x${c:0:2})) $((0x${c:2:2})) $((0x${c:4})) $c; done

srcreigh · on Oct 1, 2022

View colors here

https://codepen.io/srcreigh/pen/QWrrgdx

Code thanks to gabrielsroka on the Github thread

blondin · on Oct 1, 2022

oh wow, #seabed generated a beautiful blue. what a truly happy accident!

cmehdy · on Oct 2, 2022

Acacia is green, and fesses (buttocks in French) is pink. Coocoo is the only red in a surrounding of violets, and sobbed is a transparent-y blue like a tear :)

srcreigh · on Oct 2, 2022

Access is green, acidic is red, and my favourite, cabbie is a nice yellow!

LanternLight83 · on Oct 1, 2022

https://gist.github.com/aileftech/dd4f5598b1f3837651fdf16e5a...

Silverback_VII · on Oct 1, 2022

Not long ago I saw a link here to site with the words and the colors...

amenghra · on Oct 1, 2022

This maybe? https://news.ycombinator.com/item?id=31673662

styfle · on Oct 1, 2022

Also this https://news.ycombinator.com/item?id=14537747

pushedx · on Oct 2, 2022

What about 7 for T and also 3 for E?

jaclaz · on Oct 2, 2022

E is a legit hex character:

0123456789ABCDEF

isn't it?

The 3 for E in 1337 speak was on numerical calculators that didn't display letters.

pushedx · on Oct 4, 2022

Using 3 you can get more colors with human readable names, and maybe pick the canonical color for any given word based on some criteria of interestingness.

IgorPartola · on Oct 2, 2022

No 7 for a T?