I am currently running over 13000 nodes with a 5 server cluster. These are large...

martijn9612 · on Oct 15, 2021

I'm sorry, what is the exact benefit of running multiple nodes on one piece of hardware? Just software failure resilience?

alexeldeib · on Oct 15, 2021

I assume they meant a 5 server control plane supporting 13k worker nodes, not that they partitioned 5 large hosts into 13k smaller ones. It's a counterpoint to GP's "raft gets slow with more servers", I think.

chucky_z · on Oct 15, 2021

This is correct.

We run a Raft cluster of 5 voting members with 13,000 physical Nomad nodes mixed across OS's with a bunch of random workloads using docker, exec, raw_exec, java, and some in-house drivers. I'll clarify that (and I can't... because I can't edit my post anymore :( ).

tetha · on Oct 15, 2021

I don't see where you see multiple nodes. I'd read the parent post as: There are 5 identically sized, beefy VMs. Each of these 5 beefy VMs run 1 instance of the nomad server. And then, there is 13k other VMs or physical systems, which run nomad in client mode. These clients connect to the server to register and get allocated allocations / task groups / practically containers or vms or binaries to run.

We're nowhere near that size, but the architecture is probably identical.