Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am currently running over 13000 nodes with a 5 server cluster. These are large boxes (90gb memory, 64 cores) but cpu usage is under 20% and memory usage is under 30%.


I'm sorry, what is the exact benefit of running multiple nodes on one piece of hardware? Just software failure resilience?


I assume they meant a 5 server control plane supporting 13k worker nodes, not that they partitioned 5 large hosts into 13k smaller ones. It's a counterpoint to GP's "raft gets slow with more servers", I think.


This is correct.

We run a Raft cluster of 5 voting members with 13,000 physical Nomad nodes mixed across OS's with a bunch of random workloads using docker, exec, raw_exec, java, and some in-house drivers. I'll clarify that (and I can't... because I can't edit my post anymore :( ).


I don't see where you see multiple nodes. I'd read the parent post as: There are 5 identically sized, beefy VMs. Each of these 5 beefy VMs run 1 instance of the nomad server. And then, there is 13k other VMs or physical systems, which run nomad in client mode. These clients connect to the server to register and get allocated allocations / task groups / practically containers or vms or binaries to run.

We're nowhere near that size, but the architecture is probably identical.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: