So, if I'm understanding this right, finger trees are just 2-3 trees with an extra array of 8 things per level (four for prefix, four for suffix). Does this really have a benefit over just using a 2-3 tree?
I don't think that's quite right -- have a close look at the translation of the example finger tree to code. (Grep for "The translation of this tree".)
The basic intuition is that the far left and right of the original tree have been 'lifted up' to just under the root. This means access to the tips is constant time instead of O(log(n)), which has obvious benefits for some access patterns. The really neat thing about this data structure, though, is all the other neat properties these trees have: you can implement random-access arrays and priority queues and a bunch of other patterns quite efficiently, just by tweaking a parameter or two.
You get amortized-constant-time additions and deletions at both "ends", which is a pretty big deal for some use cases. And log-time concatenation, which I don't think 2-3 trees give you. And it's all done purely functionally, which means you get persistence if you want it (i.e., you can keep around old versions of the data structure and the only cost you pay is the memory).
2-3 trees actually do give you log-time concatenation; most species of balanced trees do, including red-black, AVL, splay (amortized, of course), skip lists, and treaps.
The answer is that each modification costs O(1) amortized time when you have a 4.
It even gives a little anecdote in the acknowledgements on the last page:
"On October 29, 1979, Mehlhorn presented the results of [4] in Zurich. Ed McCreight was in the
audience and asked, 'Is a similar result true for 2-3 trees?' The counterexample shown in Fig. 3
was described. He then asked, 'How about 10-50 trees?'"
(Ed McCreight co-invented the B-tree)
([4] is Blum, N., Mehlhorn, K.: On the average number of rebalancing steps in weight-balanced trees.
Theor. Comput. Sci. 11, 303-320 (1980))
However, you don't actually have to understand all of this to get the
gist of why 2-3 trees don't have constant-amortized-time
modifications. You can see that from a simple little experiment on
binary numbers.
Imagine that you had to implement binary arithmetic on your own. The
first thing you implement is a binary counter with only increment. It
starts at 0, you can add 1 as many times as you like. That's it.
Note that a counter with n bits may require n bit flips to
increment 11111...1. However, by assigning a binary number a potential of the
number of 1 bits, you can see that incrementing takes O(1) amortized
bit flips using the physicist's method.
This doesn't work when you can also subtract one from your counter,
since subtracting one can turn a lot of 0s into 1s. Adding one again
turns them back, and you're back where you started, having spent no
potential yet requiring a bunch of bit flips in each operation.
However, if you allow each digit to be 0, 1, or 2, you can assign a
positive potential to the 0 and 2 digits and zero potential to the 1
bits to show that both increment and decrement use O(1) amortized bit
flips.
The same thing happens with 2-3 trees vs. 2-3-4 trees. I believe 2-3 trees have amortized O(1) insert if deletion is not permitted, but 2-3-4 trees permit both.
Number systems have some deep connections to data structure design like this. One place you can read more is in Okasaki's book, "Purely Functional Data Structures", through plenty of free resources exist on the topic as well - Amr Elmasry tends to publish about it a lot. Once you get it, you start to see the pattern in many other places.
Clarification: In the example with 0, 1, and 2 as allowed digits, I still intended for the number to be "binary" in the sense that the ith digit has weight 2^i. Thus, 12 in this system is 12^1 + 22^0 = 4, in decimal, and, in this system, 12 = 20 = 100.
My understanding is that 4 is a relatively sweet spot for allowing efficient consing/deconsing of the finger nodes. I became fascinated by fingertrees a few years ago and created several toy implementations. I recall that the size of the fingers did impact the performance of various operations though I no longer remember.
Sadly I do recall that the constant factors were prohibitively high which made finger trees rather unsuited as an every day list/array data structure for most applications. It would certainly simplify programming to have one sequential collection type that is adequately performant at everything.
> My understanding is that 4 is a relatively sweet spot for allowing efficient consing/deconsing of the finger nodes.
I doubt it. For the persistent data structures I've implemented a branching factor of 32 or 64 worked best (persistent hash maps, vectors, and b+ trees). A high branching factor keeps the depth of the tree very small. Copying a block of memory a few times is cheaper than copying a little memory block a lot of times.
Copying the fingers is the most common operation. Every cons/decons requires copying one of the fingers. Operations that propagate deeper into the tree are lazy, so the primary factor affecting performance of common operations is time required to copy one of the fingers. The result of using a higher branching factor would be copying a larger memory block many times.
[0] https://github.com/gibiansky/IHaskell