presumably you want your service to be up. If you can smooth the load out to other nodes, your service stays up. If you have a bad strategy, you start losing capacity as nodes get removed.
Let's say each node can handle 5k rps and you have 10 nodes. You can handle 50k rps. If you are receiving 40k rps, a good strategy will put each node at 80% capacity. A bad strategy will knock a node out, reducing your total capacity, putting extra pressure on the rest of the system, causing more failures, and more pressure. This is called a thundering herd.
At some point, your only option is load shedding. But with a bad LB strategy, you start load shedding much much earlier than you should. This is a bad experience for customers that is avoidable with good LB strategies.
Fair. Our solution is to make sure a biased load balancer (like in TFA) isn't sending the workload towards a select few machines while others may be not be as over worked, in as simple way as possible.
We run load balancers in a fail open mode. As in, if every backend is excusing itself, then none are excused.
But as you point out, load balancing is a hairy beast unto itself.
I don't see why this is a problem? This is when we should start rejecting requests on the frontend