Storing state in Erlang with processes (2014)

rdtsc · on May 31, 2019

That's a good introductory article in that it starts from a very basic idea of just an assignment and ends up with a gen_server.

This explicit state management and immutability is very nice in large applications. You can see what is happening in a piece of code just by looking at the local context (a function or a module) since you'd have the initial state, the update operation, and the new state. If something goes wrong, tracing just that function can often reveal the error. Contrast that to an implicit "this" object with multiple levels of class inheritance, then things become a lot more complicated.

Further, when it comes to state management, process heap isolation is invaluable. If something goes wrong, you can safely restart just some parts of the application without leaving others in an unknown state. Operating systems have figured this out many decades ago, and going back to concurrency units that share a single heap feels like we are back in the Windows 3.1 world, where your calculator crashing also took down your editor.

yomly · on May 31, 2019

>You can see what is happening in a piece of code just by looking at the local context

^ This. A thousand times this.

When I read this article, all I could think was _this_ is what object oriented code is meant to be about, aka message passing.

I have been working on a Rails codebase and it's an absolute nightmare - without knowing everything all at once you can come to a tiny class that has a bunch of identifiers set via inheritance or somehow used via inflection. It's completely mad.

Not meaning to start a flame-war but it confuses me how OO went from a simple idea of encapsulated "objects" having well defined interfaces to this crazy spaghetti mess of inheritance etc.

That said, bad code is bad code and you can ship a ball of mud in anything...

jleang · on May 31, 2019

hmm, its better to store state in ETS so when it does crash it can be restarted by the supervisor without losing the state.

filmor · on May 31, 2019

This depends highly on what you are storing. Letting a process reuse "old" state after a crash can easily cause a restart loop.

akashakya · on May 31, 2019

That depends on the usecase. Sometimes you do want to start with clean state. If the process is in some invalid state which caused the crash in the first place, restarting the process is useless in this case

derefr · on May 31, 2019

There’s a higher, systems-architecture-level equivalent to this, too (which Erlang is also designed to take advantage of, re: the Erlang distribution protocol, and the Mnesia DBMS), and that’s the idea that “durability is replication.”

That is, there’s no semantic difference between “a disk” and “the in-memory heap of another node on the network”, in terms of what fault-tolerance guarantees you get by adding one or the other to your system. Machines can crash? So can disks. Back up to a disk? Back up to another node. Restore from on-disk state? Restore by streaming state from another node.

You can build a DBMS cluster, or even something “ultra-durable” like S3, with clusters of disk servers, sure. Or — providing memory is cheap-enough for your use-case — you can build it with clusters of RAM nodes, with no disks at all (and thus one tenth the maintenance costs.) Power outage wipes the whole cluster? Not if you’ve got a UPS, a generator, and a whole lotta gas (like telecom base-station switches are stocked with.) Or if you’ve got backup nodes — which are also just RAM nodes! — to stream to in another region. (Your system itself doesn’t need to be multi-region distributed; it can treat these nodes the same way a regular persist-to-disk system treats tape backup.)

And, conveniently, it’s not just memory accesses that are faster with such an architecture. If you build your cluster architecture as a set of services where each service isn’t just a single node (or just a load-balanced set of nodes), but rather a “distribution set” of nodes—i.e. one “transaction router” node that multicasts commands to a set of equivalent worker nodes, where the worker nodes all deterministically do the same things in response; then, rather than one master, you have N copies of your node, all of which are equivalent hot-standbys of your “master”! (You don’t have to use them as such; you can route client requests to just one “master” in the distribution set, making the others merely warm standbys.) Compared to streaming replication, where some of your nodes are always “behind” their master, this approach is far less risky in terms of data loss. This isn’t streaming replication; it’s RAID for memory writes!

(And yes, this is the precise use-case that Erlang’s distribution protocol is designed for: nodes in the same “distribution set” working as equivalent, deterministic warm-standby alternatives to one-another, presenting as a single virtual “node” in a larger cluster architecture, almost always connected by a non-partitionable network backplane, like a single top-of-rack network switch. Of course, nobody uses Erlang’s distribution this way other than Ericsson... but that’s because nobody but Ericsson has needs that force them to avoid disks—and avoiding disks entirely isn’t cheap, at least today. Still, it’s important to keep in mind, because the Erldist protocol works best when you hew to this design, and actively fights you when you don’t; and because Mnesia was never designed for anything other than being a memory-to-memory replicated DBMS—and it works great for that!—but it really kind of sucks once you introduce any disk copies to its replication strategy.)