Yep, this is the thought process I had; a red-black tree or B-tree is something I'd consider for a map that needed sort order, and a hashmap would be what I'd use by default if I didn't care about order.
For additional context, I work almost entirely in Rust nowadays, and BTreeMap and HashMap are the two map types in the standard library (https://doc.rust-lang.org/std/collections/index.html). I've ended up needing insertion-ordered maps quite often in my work though, which is not something I've seen taught very often or in standard libraries, although there are plenty of third-party implementations out there, and in most languages it's not too difficult to hand-roll an implementation using an unordered map and a linked list by keeping both the value and a reference to the node containing the element in the map, which provides constant time insertion, lookup and deletion. (Rust is somewhat notorious for _not_ being a language where implementing something like this by hand is super easy though, since tracking mutable references to individual nodes of a linked list across separate hashmap values is not something the borrow checker is enthusiastic about)
Not really. Hashmaps don't have guaranteed performance characteristics and require the "correct" hash function to work properly. In particular, their performance degrades as they become more "full" requiring a re-allocation to a table of generally n-times the size. Balanced trees could be regarded a more generic solution to the mapping problem and are more suitable in many applications.
Trees require a lot more memory indirections than hashmaps, which makes them slower to traverse. They require a lot more interior pointers, which makes them bigger. It’s easy for a hashmap to beat a tree.
Yes really, that’s why the vast majority of languages default to a hashmap (if they even have tree-based maps at all). Turns out the average application is a lot more interested in a low-constant-factor O(1) best case than in a high-constant-factor O(log n) worst case.
> Hashmaps don't have guaranteed performance characteristics
Of course they do.
> require the "correct" hash function to work properly.
And tree maps require correct comparators to work correctly.
> In particular, their performance degrades as they become more "full" requiring a re-allocation to a table of generally n-times the size.
Yes? Also “n-times” seems like pretty fuddy fud. And it seems weird to be more worried by the reallocation of a hashmap when a tree requires an allocation per node, so your average balanced tree requires a heap allocation per entry. And two pointers, for up to 16x overhead not including the allocator’s own metadata.
> Balanced trees could be regarded a more generic solution to the mapping problem and are more suitable in many applications.
And the amphibious cycle could be regarded as the more generic solution to the transport problem.
>> Hashmaps don't have guaranteed performance characteristics
>Of course they do
No, they do not, at least not worst case O(1). If you write a very poor (but functional!) hash function or an adversary is choosing your items, then congratulations, your hashmap is now a linked list with extra steps.
Sure, you could use a robust cryptographic hash function, but you didn't because (a) it's not the default in your language and (b) it's slow and you are "a lot more interested in a low-constant-factor O(1) best case".
I mean it's kind of pedantic but yes hash maps do have guaranteed performance characteristics. They are not guaranteed to be O(1), but they will never be something like O(2^N) either. You can give guarantees about their performance if you can make guarantees about the distribution of the inputs and the properties of the hash used.
In my own experience, most uses of hash maps are typically O(logn). A good rule of thumb is hash maps are preferable unless you need predictable worst case performance or your hash maps are directly exposed to the public.
> A good rule of thumb is hash maps are preferable unless you need predictable worst case performance or your hash maps are directly exposed to the public.
Also, obviously, features only one of them provides e.g. if you need range queries then a hash table is quite useless. Meanwhile if you need insertion ordering, that’s reasonably easy to do with a hashmap.
Also if it's big enough that recursing will blow out your stack.
Yes, you can throw the nodes on an array in the heap to allow tail recursion, but at that point you're going to be losing to a map with all but the biggest datasets, and even then the copies really aren't desirable
> Yes? Also “n-times” seems like pretty fuddy fud. And it seems weird to be more worried by the reallocation of a hashmap when a tree requires an allocation per node, so your average balanced tree requires a heap allocation per entry.
Yes, it requires allocation for every entry whereas a hash map requires it for some entries. You don't know which ones. So real time constraints are out the window. But no it's not something your average web programmer (including me) needs to worry about. Some people do, though.