That's not what Go does though. Go looks at the population of the CPU mask at startup. It never looks again, which of problematic in K8s where the visible CPUs may change while your process runs.
We use https://github.com/uber-go/automaxprocs after we joyfully discovered that Go assumed we had the entire cluster's cpu count on any particular pod. Made for some very strange performance characteristics in scheduling goroutines.
My opinion is that setting GOMAXPROCS that way is a quite poor idea. It tends to strand resources that could have been used to handle a stochastic burst of requests, which with a capped GOMAXPROCS will be converted directly into latency. I can think of no good reason why GOMAXPROCS needs to be 2 just because you expect the long-term CPU rate to be 2. That long-term quota is an artifact of capacity planning, while GOMAXPROCS is an artifact of process architecture.
If you have one pod that has Burstable QoS, perhaps because it has a request and not a limit, its CPU mask will be populated by every CPU on the box, less one for the Kubelet and other node services, less all the CPUs requested by pods with Guaranteed QoS. Pods with Guaranteed QoS will have exactly the number of CPUs they asked for, no more or less, and consequently their GOMAXPROCS is consistent. Everyone else will see fewer or more CPUs as Guaranteed pods arrive and depart from the node.
Perhaps it is an artifact of your and my various container runtimes. For me, in a guaranteed qos pod, taskset shows just 1 visible CPU for a Guaranteed QoS pod with limit=request=1.
Even way back in the day (1996) it was possible to hot-swap a CPU. Used to have this Sequent box, 96 Pentiums in there, 6 on a card. Could do some magic, pull the card and swap a new one in. Wild. And no processes died. Not sure if a process could lose a CPU then discover the new set.
It's not a bad way to guess, up to maybe 16 or so. Most Go server programs aren't going to just scale up forever, so having 188 threads might be a waste.
There's going to be a bunch of missing info, though, in some cases I can think of. For example, more and more systems have asymmetric cores. /proc/cpuinfo can expose that information in detail, including (current) clock speed, processor type, etc, while cpu_set is literally just a bitmask (if I read the man pages right) of system cores your process is allowed to schedule on.
Fundamentally, intelligent apps need to interrogate their environment to make concurrency decisions. But I agree- Go would probably work best if it just picked a standard parallelism constant like 16 and just let users know that can be tuned if they have additional context.
Yes, running on a set of heterogenous CPUs presents further challenges, for the program and the thread scheduler. Happily there are no such systems in the cloud, yet.
Most people are running on systems where the CPU capacity varies and they haven't even noticed. For example in EC2 there are 8 victim CPUs that handle all the network interrupts, so if you have an instance type with 32 CPUs, you already have 24 that are faster than the others. Practically nobody even notices this effect.
> in EC2 there are 8 victim CPUs that handle all the network interrupts, so if you have an instance type with 32 CPUs, you already have 24 that are faster than the others
Fascinating. Could you share any (all) more detail on this that you know? Is it a specific instance type, only ones that use nitro? (or only ones without?) This might be related to a problem I've seen in the wild but never tracked down...