That was a trick question actually - use your Prometheus stack to alert on latency sensitive workload with usage over request and ignore everything else.
Of course, you're missing the point. Depending on your application a little throttling doesn't hurt, and it can save other applications running on the same nodes that DO matter.
In the meantime you can monitor rate of throttling and rate of CPU usage to limit ratio. Nothing stops you from doing this while also monitoring response latency.
On the other hand CPU request DOES potentially leave unused CPU cycles on the table since it's a reservation on the node whether you're using it or not.
You got it completely backwards. Request doesn’t leave unused cpu as it is cpu.shares, limit does being cfs quota that completely prevents your process from scheduling even if nothing else is using cycles. Don’t believe me? here’s one of kubernetes founders saying same thing - https://www.reddit.com/r/kubernetes/comments/all1vg/comment/...