I just wanted to give you a taste of how this works in an ungeneralized way befo...

I just wanted to give you a taste of how this works in an ungeneralized way before I went. Linear time algorithms are fast and feel very different. I want to stress this is not directly related to the technique in the paper, but is in the same "family" of algorithms, and is very easy to understand.

Imagine you're doing the classic "write an algorithm to determine if one word is the anagram of another". The classic solution is to sort and compare the strings to be checked.

We can do this pretty elegantly for strings without using quicksort. It goes like this: allocate a block of 26 bytes. Each byte is a counter for a letter in that position (0 -> A, 1 -> B, ...). Now sweep across the string, and each time you see a letter, increment that number. If you really care hard, you can go crazy with SIMD and other clever tricks here.

Once you're done this for 2 strings, you need only compare the two regions for equality (a constant time operation).

The worst case, average case, and minimum running time of this algorithm is O(len_word_1+len_word_2)+c, and because we can safely make assumptions about the length of the strings (no word in anym ISO-Latin-1 encoding exceeds 255 of any character), we can do it fairly compactly. This is MUCH faster and can be done with perfect memory safety (which is to say, it is optimal with mutability but all operations are commutative so they may be subdivided, done in any order, and recombined at will).

Try implementing this. It's really, really fast on modern hardware. Especially if you're working it out for a full /usr/share/dict/words file on a normal _nix/bsd installation.

We can even see that composability feature in our current problem! Imagine we have have that big dictionary file and we want to find ALL the anagrams in it. Imagine we start cutting the above technique in half and computing the byte vector for every word in the dictionary (which is O(n) time). If we sort THESE vectors, we could just pick out all the equal values and say "There are the anagram groups", right?

If we were to insert them into some very wide trie (as is the style these days), that'd be O(Numwords * log(numwords)), which is really just sorting again. We'd do it in minimum 2 passes with O(n log n) cost. But if we have composable discriminators we could do this in one big pass with O(n) cost! Our constant factors would grow because we're dealing with more memory. We could then not only sweep across the sorted list to extract groups (note the symmetry with the ABC counter approach above), but we could also pretend it's a tree structure (start at the length/2 element in the list and pretend its a binary tree) and search for new items inside the block, so we could check for new word anagrams subsequently in O(log n) time.

The paper I linked talked about generalizing this idea (a special case of radix sort) and making it composable.