I see that, when you create several maps that have the same keys but with differ...

jlouis · on April 9, 2014

The first one: Yes, it has to. Otherwise the keyset would be shared among processes and that is generally not possible in Erlang (Immutable binary data being the exception)

The second one: yes and no. The format is described at http://erlang.org/doc/apps/erts/erl_ext_dist.html and maps are laid out as a serialized construction like you say. Two points balances this out: There is an atom-cache which allow you to cache atoms and make their representation small. And you can zlib-compress the data.

It is not set in stone yet and might even change in a later versions. The term format is versioned so you can upgrade it later if need be.

derefr · on April 9, 2014

Right, what I was asking in the second question is basically whether there exists, or is planned, a keyset cache to go along with the atom cache. I'd think that, if the atom cache is a good idea, a keyset cache would be good for exactly the same reasons.

jlouis · on April 9, 2014

If you do that, it is better to create an arbitrary caching construction of subtrees and then reap the benefit by tighter packing of data in general. I agree a keyset cache could be really nice to have going forward. It would resemble what happens in-heap.

But the rule of the Ericsson OTP team is to get it correct before making it fast.

derefr · on April 9, 2014

Atoms and keysets both have pretty much the same caching semantics, though: they get repeated over the wire with pretty good locality, and don't have conflicting terms busting the cache in-between. That's not really true for anything else, which makes me guess that an arbitrary term-branch cache would be pretty useless.

jlouis · on April 10, 2014

The idea of having arbitrary subtrees is to support an efficient compression scheme. But you are indeed right that a keyset cache would be extremely effective at limiting the size of maps.