Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A whole post about hash tables and he doesn't mention the real reason trees are better than hash tables: lower variance!

> Hash tables become full, and bad things happen

The bad thing is not a crazy long probe or even rehashing to enlarge the table—it's that some random individual insert is stuck eating the entire cost of rebuilding the table. For many applications, I'm happy with a slightly higher cost per operation in return for predictable performance.

> Hash functions are slow

Great, another unmentioned footgun - hash functions are hard. For example, the hash function he provides doesn't use a prime number of buckets. Oops, non-uniform input data in combination with unlucky bucket counts can generate high numbers of collisions.

Writing a comparison function to put your data in a tree is pretty stupidproof!



There are resizing algorithms out there which specifically avoid the single-insertion-spike that you're worried about.

http://stackoverflow.com/a/2200345

If you're working with small data sizes, the difference between O(1) and O(lgN) isn't that big a deal, so you can use whatever you want. But once you start working with larger datasets containing millions of elements, and the only thing you care about is insertions and exact-match-lookups, it's hard to justify using a tree over a hash map.


For similar reasons I prefer reference counting over (tracing) garbage collection, especially in near-realtime applications, such as audio.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: