A whole post about hash tables and he doesn't mention the real reason trees are better than hash tables: lower variance!
> Hash tables become full, and bad things happen
The bad thing is not a crazy long probe or even rehashing to enlarge the table—it's that some random individual insert is stuck eating the entire cost of rebuilding the table. For many applications, I'm happy with a slightly higher cost per operation in return for predictable performance.
> Hash functions are slow
Great, another unmentioned footgun - hash functions are hard. For example, the hash function he provides doesn't use a prime number of buckets. Oops, non-uniform input data in combination with unlucky bucket counts can generate high numbers of collisions.
Writing a comparison function to put your data in a tree is pretty stupidproof!
If you're working with small data sizes, the difference between O(1) and O(lgN) isn't that big a deal, so you can use whatever you want. But once you start working with larger datasets containing millions of elements, and the only thing you care about is insertions and exact-match-lookups, it's hard to justify using a tree over a hash map.
> Hash tables become full, and bad things happen
The bad thing is not a crazy long probe or even rehashing to enlarge the table—it's that some random individual insert is stuck eating the entire cost of rebuilding the table. For many applications, I'm happy with a slightly higher cost per operation in return for predictable performance.
> Hash functions are slow
Great, another unmentioned footgun - hash functions are hard. For example, the hash function he provides doesn't use a prime number of buckets. Oops, non-uniform input data in combination with unlucky bucket counts can generate high numbers of collisions.
Writing a comparison function to put your data in a tree is pretty stupidproof!