Agree re noisy neighbours, but autoscaling depends on _requests_ rather than _li...

EdSchouten · on March 26, 2020

The problem with having no throttling is that the system will just keep on running happily, until you get to the point where resources become more limited. You will not get any early feedback that your system is constantly underprovisioned. Try doing this on a multi-tenant cluster, where new pods spawned by other teams/people come and go constantly. You won't be able to get any reliable performance characteristics in environments like that.

For such clusters, it's necessary to set up stuff like the LimitRanger (https://kubernetes.io/docs/concepts/policy/limit-range/) to put a hard constant bound between requests and limits.

dilyevsky · on March 26, 2020

And how will you get feedback on being throttled other than shit is randomly failing e.g connection timeouts?

snupples · on March 27, 2020

Effective monitoring. Prometheus is free and open source. There are other paid options.

dilyevsky · on March 27, 2020

That was a trick question actually - use your Prometheus stack to alert on latency sensitive workload with usage over request and ignore everything else.

snupples · on March 27, 2020

Of course, you're missing the point. Depending on your application a little throttling doesn't hurt, and it can save other applications running on the same nodes that DO matter.

In the meantime you can monitor rate of throttling and rate of CPU usage to limit ratio. Nothing stops you from doing this while also monitoring response latency.

On the other hand CPU request DOES potentially leave unused CPU cycles on the table since it's a reservation on the node whether you're using it or not.

Again needs may vary.

dilyevsky · on March 27, 2020

You got it completely backwards. Request doesn’t leave unused cpu as it is cpu.shares, limit does being cfs quota that completely prevents your process from scheduling even if nothing else is using cycles. Don’t believe me? here’s one of kubernetes founders saying same thing - https://www.reddit.com/r/kubernetes/comments/all1vg/comment/...

snupples · on March 27, 2020

Incorrect. If a node has 2 cores and the pods on it have request of 2000m nothing else will schedule on that node even if total actual usage is 0.

You can overprovision limit.

This is easy to test for yourself.

rumanator · on March 27, 2020

> Agree re noisy neighbours, but autoscaling depends on _requests_ rather than _limits_, so you could define requests for HPA scaling but leave out the limits and have both autoscaling and no throttling.

I've just checked Kubernetes' docs and I have to say you are absolutely correct. Resource limits are used to enforce resource quotas, but not autoscaling.