*The most literal conclusion to draw from this story is that MRU connection pool...

ngbronson · on Nov 26, 2014

I coded the fix. We considered some options that had a bias away from the slow links, but we thought it was safer to avoid bias completely. It was fine in my tests, but I couldn't prove that biasing toward the faster link in a 2 link setup wouldn't set up an oscillation.

cperciva · on Nov 26, 2014

I wondered if that was it, but a 1/RTT loading behaviour seems safe enough to me -- after all, that's exactly what TCP does, and you're presumably not seeing any uncontrolled oscillations from that.

I guess it depends how much minimizing the size of your connection pool matters.

mabbo · on Nov 26, 2014

Great reasoning. Problem: bias isn't balanced, causing complex failure. Solution A: change the bias. Solution B: remove all bias.

One of these options makes things simpler. I'm always in favor of that.

xtacy · on Nov 26, 2014

I am not sure that would work either. If I understand correctly, the root cause of delayed responses is because more requests were sent on that link due to hash collisions, which delays connections on that link. So, wouldn't "most recent request sent" correlate well with "most recent response received"?

EDIT: Also, this seems like a classic load balancing problem: Simply picking the least loaded connection would have been sufficient. The response time on each connection could be computed either explicitly (a running average/standard deviation of RPC finish times) or implicitly by checking the queue backlog at any time (the queue backlong being a first-order statistic).

mcpherrinm · on Nov 26, 2014

But the cache is oblivious to which links the TCP connections are going over (well, except via latency & throughput as the post showed). If I understand what you propose, they need to add more bookkeeping to track what the connection did, instead of just doing pool.insert(conn_fd, time.now()). It sounds like the fix was just swapping the side you evict from, which could have been as little as a one line change.

Without a bit more information I think I am just speculating and don't really know. Their solution doesn't really taste bad to me though.

ifdefdebug · on Nov 26, 2014

Out of nothing, I'd guess their MRU worked by just pushing an incoming free connection on top of a LIFO stack, while for the "most recent request sent" strategy you had to keep track of request timing and insert incoming free connections into the middle of the stack, complicating things.

Within their solution just had to convert the LIFO stack into an FIFO queue, keeping it simple.

hueving · on Nov 26, 2014

stack implies LIFO and queue implies FIFO. Otherwise they wouldn't have different names. :)

ifdefdebug · on Nov 26, 2014

well FIFO stack still exists as a synonym for queue, but you're right, stack and queue would have qualified my point good enough :) anyway, looking at newer answers it seems that topic was not of concern for their solution...

karavelov · on Nov 26, 2014

That was also my first idea about solving it. No idea why they didn't took this route.

stevejones · on Nov 26, 2014

I want to know why they're violating the end-to-end principle in their switches.