> How does systemd handle a network partition? The same way K8s does... it doesn...

teraflop · on Aug 24, 2022

> The same way K8s does... it doesn't. K8s management nodes just stop scheduling anything until the worker nodes are back

This is not entirely accurate. K8s makes it possible to structure your cluster in such a way that it can tolerate certain kinds of network partitions. In particular, as long as:

* there is a majority of etcd nodes that can talk to each other

* there is at least one instance of each of the other important daemons (e.g. the scheduler) that can talk to the etcd quorum

then the control plane can keep running. So the cluster administrator can control the level of fault-tolerance by deciding how many instances of those services to run, and where. For instance, if you put 3 etcd's and 2 schedulers in different racks, then the cluster can continue scheduling new pods even if an entire rack goes down.

If you assign the responsibility for your cluster to a single "parent" node, you're inherently introducing a point of failure at that node. To avoid that point of failure, you have to offload the state to a replicated data store -- which is exactly what K8s does, and which leads to many of the other design decisions that people call "complicated".