You might be surprised just how much churn there is if you look at the source of...

chongli · on Aug 26, 2013

The balanced binary tree of Data.Map is nowhere near state of the art in terms of persistent data structures. Clojure (and Haskell's unordered-containers library) use Hash Array Mapped Tries (HAMTs) which have a much higher branching factor (32-ary) than binary trees. In practice, this is a neglible difference from constant-time.

cgag · on Aug 26, 2013

Somehow the difference between O(1) and O(log32 n) has never actually been an issue for me in Clojure.

detrino · on Aug 27, 2013

Have you profiled the difference? Do you use these containers in places where it matters?

cgag · on Aug 27, 2013

I'm sure there is a difference, I just don't care. They're still damn fast and the semantics of immutability would be worth quite a large speed difference to me.

If you do happen to be doing something where the difference is truly going to matter, or you're writing a library anticipating needing that kind of performance, using mutability internally in your functions isn't discouraged as long as they remain referentially transparent: http://clojure.org/transients

orclev · on Aug 27, 2013

That's a good point. People don't understand that functional purity in Haskell, while a nice goal, isn't a law, you can break it if you have a good reason to. IO is sort of the canonical example, it isn't there to prevent you from doing IO, it's there to act as a giant warning sign that either this function, or one of the functions it uses isn't pure.

Likewise mutable data structures while not the default, can be implemented in Haskell, they just have some caveats attached and shouldn't be used lightly. In other words, all things being equal, use the immutable data structure, but if you have a good reason to, you can use a mutable data structure instead, just be sure you know what you're getting into.

detrino · on Aug 27, 2013

The problem is that "all things being equal" never happens because immutable data structures have worse performance if not used in a way that benefits from persistence. IMO, that makes them a poor default.

PeterisP · on Aug 28, 2013

I'd rather say that 'performance+ convenience/safety-' is a poor default - minute performance improvements matter only in the bottlenecks of your code, so it makes sense to use the varsatile though slightly slower implementations as the default, and have the 100%-speed option require some explicit code as they are needed only in a minority/exception segments of your program.

:) You could say that "all things being equal" never happens because mutable data structures have worse persistence if they're not used in a way that their performance matters. But also, different defaults make sense for different problem domains.

cgag · on Aug 27, 2013

Pretty much everything except performance benefits from immutability and persistence, which is why using anything else as the default seems insane to me.

tel · on Aug 27, 2013

And Haskell's ST monad.

reycharles · on Aug 27, 2013

What do you mean by O(log32 n)?

cgag · on Aug 27, 2013

Clojure's data structures are implemented using trees that have a 32-way branching factor, so each node has 32 children, instead of the 2 you'd see in a binary search tree. A lot of things you might expect to be O(1), like checking for membership in a set, or getting the nth element of a vector, actually take O(log_base_32(n)).

reycharles · on Sept 6, 2013

Ah, I see. That doesn't matter in big-O notation, though, because log_32(n) = log_2(n) / log_2(32), that is the difference between two bases is a constant factor.

I was a bit confused because O(log_32(n)) = O(log(n)) = log(32)+log(n) = O(log(32n)), so no matter how I parsed it I would just get O(log(n)) :)

zorked · on Aug 26, 2013

Also, C++, which is the king of mutability, also implements map as a tree instead of a hash table, probably trying to avoid the worst-case linear search.

bloodorange · on Aug 27, 2013

The standards committee could not come to a complete agreement on hash tables and therefore it was not a part of C++03. Hash tables were in consideration along with the tree based map. However, maps made it in time but hash tables didn't. It was not one versus the other. They wanted to have both and only one was ready on time.

Eventually by the time they came to an agreement on hash tables (C++11), a lot of vendors had their own versions of it as non-standard additions to the library. To avoid conflicts with these existing hash tables, they named it "unordered_map".

detrino · on Aug 27, 2013

The C++ standard places a number of requirements (ordered, stable iterators/references, logarithmic operations) on std::map which really only lines up with balanced binary trees. As another poster pointed out, C++11 introduced std::unordered_map (A hash table).

anonymoushn · on Aug 26, 2013

unordered_map?

bloodorange · on Aug 27, 2013

See my comment above https://news.ycombinator.com/item?id=6282310

strager · on Aug 27, 2013

Of course, there's Data.HashMap.HashMap[1], which is just a Data.IntMap.{Lazy,Strict}.IntMap[2], whose "implementation is based on big-endian patricia trees".

> Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).

[1] http://hackage.haskell.org/packages/archive/hashmap/1.3.0.1/... [2] http://hackage.haskell.org/packages/archive/containers/lates...