Hacker News new | past | comments | ask | show | jobs | submit login

You might be surprised just how much churn there is if you look at the source of, for instance, Haskell's Data.Map, which in the name of immutability is implemented as a balanced binary-tree representation, rather than the more common hash table. Many operations that are amortized O(1) with a hash table are O(log N) with a balanced B-tree, since for instance after an update a all the sub-trees around the update site will be made afresh, since the balancing will change.



The balanced binary tree of Data.Map is nowhere near state of the art in terms of persistent data structures. Clojure (and Haskell's unordered-containers library) use Hash Array Mapped Tries (HAMTs) which have a much higher branching factor (32-ary) than binary trees. In practice, this is a neglible difference from constant-time.


Somehow the difference between O(1) and O(log32 n) has never actually been an issue for me in Clojure.


Have you profiled the difference? Do you use these containers in places where it matters?


I'm sure there is a difference, I just don't care. They're still damn fast and the semantics of immutability would be worth quite a large speed difference to me.

If you do happen to be doing something where the difference is truly going to matter, or you're writing a library anticipating needing that kind of performance, using mutability internally in your functions isn't discouraged as long as they remain referentially transparent: http://clojure.org/transients


That's a good point. People don't understand that functional purity in Haskell, while a nice goal, isn't a law, you can break it if you have a good reason to. IO is sort of the canonical example, it isn't there to prevent you from doing IO, it's there to act as a giant warning sign that either this function, or one of the functions it uses isn't pure.

Likewise mutable data structures while not the default, can be implemented in Haskell, they just have some caveats attached and shouldn't be used lightly. In other words, all things being equal, use the immutable data structure, but if you have a good reason to, you can use a mutable data structure instead, just be sure you know what you're getting into.


The problem is that "all things being equal" never happens because immutable data structures have worse performance if not used in a way that benefits from persistence. IMO, that makes them a poor default.


I'd rather say that 'performance+ convenience/safety-' is a poor default - minute performance improvements matter only in the bottlenecks of your code, so it makes sense to use the varsatile though slightly slower implementations as the default, and have the 100%-speed option require some explicit code as they are needed only in a minority/exception segments of your program.

:) You could say that "all things being equal" never happens because mutable data structures have worse persistence if they're not used in a way that their performance matters. But also, different defaults make sense for different problem domains.


Pretty much everything except performance benefits from immutability and persistence, which is why using anything else as the default seems insane to me.


And Haskell's ST monad.


What do you mean by O(log32 n)?


Clojure's data structures are implemented using trees that have a 32-way branching factor, so each node has 32 children, instead of the 2 you'd see in a binary search tree. A lot of things you might expect to be O(1), like checking for membership in a set, or getting the nth element of a vector, actually take O(log_base_32(n)).


Ah, I see. That doesn't matter in big-O notation, though, because log_32(n) = log_2(n) / log_2(32), that is the difference between two bases is a constant factor.

I was a bit confused because O(log_32(n)) = O(log(n)) = log(32)+log(n) = O(log(32n)), so no matter how I parsed it I would just get O(log(n)) :)


Also, C++, which is the king of mutability, also implements map as a tree instead of a hash table, probably trying to avoid the worst-case linear search.


The standards committee could not come to a complete agreement on hash tables and therefore it was not a part of C++03. Hash tables were in consideration along with the tree based map. However, maps made it in time but hash tables didn't. It was not one versus the other. They wanted to have both and only one was ready on time.

Eventually by the time they came to an agreement on hash tables (C++11), a lot of vendors had their own versions of it as non-standard additions to the library. To avoid conflicts with these existing hash tables, they named it "unordered_map".


The C++ standard places a number of requirements (ordered, stable iterators/references, logarithmic operations) on std::map which really only lines up with balanced binary trees. As another poster pointed out, C++11 introduced std::unordered_map (A hash table).


unordered_map?



Of course, there's Data.HashMap.HashMap[1], which is just a Data.IntMap.{Lazy,Strict}.IntMap[2], whose "implementation is based on big-endian patricia trees".

> Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).

[1] http://hackage.haskell.org/packages/archive/hashmap/1.3.0.1/... [2] http://hackage.haskell.org/packages/archive/containers/lates...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: