He didn't really explain it, but I think what was going on is the rate limiting ...

yardstick · on July 18, 2019

I think you are dead on, yeah it’s the quick rate of large numbers of requests that avoid the per-account rate limiting. Curious how they resolved this— run all authentication requests for a given user serially and in a consolidated fashion at some point. Exclusive lock the relevant db record before checking the code and recording the failure?

hwstack · on July 18, 2019

Yeah, my first thought was make every attempt acquire some per user lock with a timeout. It's pretty much the same thing. Either one would have a negligible effect on legitimate requests and would solve the problem.

roblabla · on July 18, 2019

Could start by incrementing the value, then checking if it's below the threshold, similar to an atomic fetch_add operation. PostgreSQL has RETURNING clauses, SQL Server has OUTPUT clauses, etc...

holy_city · on July 18, 2019

You could pipe the requests into a FIFO without back pressure so excessive requests are dropped by default.

chatmasta · on July 19, 2019

Distributed rate limiting is hard. He could hit multiple front ends simultaneously before they have a chance to catchup to the correct counts.

yardstick · on July 19, 2019

Yeah it is hard. The enforcement would need to be done on a single backend. Not all users need to have their auth done by the same specific backend, but each user individually should always have their auth go to the same backend (or same concurrency domain, if distributed locking applies to the architecture).