We've used both round robin and least outstanding connections in AWS and found many really weird phenomena over the years.
In general, LOC is more forgiving for uneven workloads, especially for Ruby services. If a slow request hits, that machine gets less traffic than the others and gets some room to recover and burn off its queue.
Where we found weird problems was at the extremes: very low concurrency and exceptionally high concurrency.
Low concurrency meant that one host consistently had 2 connections and another consistently had 3. That's 50% more load on one host. I believe AWS has fixed this and now tie-breakers are resolved randomly, but it was kinda painful to deal with. Our solution was to scale vertically and run fewer hosts. 19 on one host vs 20 on another was less jarring.
The other extreme was 1k+ websocket connections per host. When we autoscaled, it would add a new host. That host would have 0 connections and so the next 1k connections would all go to the new host. For that system, we changed it back to round robin.
> Low concurrency meant that one host consistently had 2 connections and another consistently had 3. That's 50% more load on one host. I believe AWS has fixed this and now tie-breakers are resolved randomly, but it was kinda painful to deal with. Our solution was to scale vertically and run fewer hosts. 19 on one host vs 20 on another was less jarring.
I assume you're saying that this happened over a period much longer than the lifetime of a connection? If all of the connections are super long lived, I'm not really sure what else the load balancer could do, other than forcibly close one and make the client swap to the other server, but naively that doesn't really seem like something I'd want a load balancer to do.
Even more basic, wouldn't that just make the split 3:2 instead of 2:3 (aka, the same thing)? I don't think we can give each server 2.5 connections (though that would be cool!) ;P.
Yeah, I guess what I meant is that forcing one connection to swap back and forth at some interval would make it 1:1 on "average" over a longer period of time, but it seems presumptuous to think that would perform better, since the cost of having to re-establish a connection each time and then potentially communicate any "state" that would be needed to resume from whatever context was being executed before doesn't seem like a good tradeoff (not to mention the huge increase in complexity of the logic the client would need to have to handle this).
Alternate the incoming connections from one server to the other. That gives the one with two connections some time to clear its queue while the other queue may be getting longer.
The problem was that it was the same one machine that always had the extra 50% traffic. The eventual fix from Amazon was to randomly select in a tie breaker so that each machine had, on average, 2.1 concurrent requests (assuming 10 machines).
I can't seem to find the article, but I remember some FAANG - FB I think - having an excellent blog post about a very hard to track down bug in their LB where performance fell off a cliff. They eventually tracked it down to an LRU and realised that the least loaded was actually the least hot and performance was substantially better if they tried to run small number of servers at capacity rather than a large number of servers all lower utilized.
Funny you mention that. When I worked there I identified a bug in an internal load balancer where new requests were being sent to the most overloaded hosts. It was far worse than if load balancing had been completely random and meant tens of thousands of servers were mostly sitting idle. The few lines of code to fix that bug likely amounted to the biggest financial impact I’ve had on a company in my whole career. Unfortunately, I didn’t have time to quantify the fix in monetary terms because I was in the midst of a reorg and the bug was totally unrelated to my main project. Kind of hilarious in retrospect to think about how such a giant problem went unnoticed for so long.
By LRU you mean caching/preloading being terrible (and there being a lot of latency) because there wasn’t enough load to allow things to properly preload?
In general, LOC is more forgiving for uneven workloads, especially for Ruby services. If a slow request hits, that machine gets less traffic than the others and gets some room to recover and burn off its queue.
Where we found weird problems was at the extremes: very low concurrency and exceptionally high concurrency.
Low concurrency meant that one host consistently had 2 connections and another consistently had 3. That's 50% more load on one host. I believe AWS has fixed this and now tie-breakers are resolved randomly, but it was kinda painful to deal with. Our solution was to scale vertically and run fewer hosts. 19 on one host vs 20 on another was less jarring.
The other extreme was 1k+ websocket connections per host. When we autoscaled, it would add a new host. That host would have 0 connections and so the next 1k connections would all go to the new host. For that system, we changed it back to round robin.