How so? I've seen caching clients exhibit some really weird behaviour under heavy load. It's not beyond the pale that, eg, the caching library doesn't do proper locking before writing, resulting in writes stomping all over each other.
Caching is normally read heavy, not write heavy, so it's plausible it wouldn't be something you'd see much under typical operation. After an outage, they'd be dealing with a thundering herd level of traffic as everything tries to reconnect, that'd be very different from normal write loads, even different than the write load they'd have seen when they first enabled caching.
Yes but either the library is seriously bugged (like, expecting writes to be ordered and screwing up things if it gets too many writes for different objects at the same time) or there was some serious bug in their implementation. Anyway the attitude and the message passed in the communication seems like handwashing to me. I might be too cynic, though.
How else would you say a 3rd party library had a bug under heavy load? 1. You don't want a defamation lawsuit your way. 2. If it was vendor code, you have a contract that may be under a NDA. 3. If it was a vendor, lawyers, lots and lots of lawyers, they likely had to say the minimal amount. The fact they sent out communications for each type of incident in such a short time was great.
I might be splitting hairs, but they say that the incident was "caused by a third party library" when in fact, the incident was caused by insufficient testing on their part.
It sounds like they're trying to shift blame for the incident but then they try to pat themselves on the back for all the effort they put into security. It comes across as dishonest.
Technical details are appreciated but they should've emphasized that this is their own fault. Bonus points if they commit to at least consider E2EE which would sidestep the issue.
Caching is normally read heavy, not write heavy, so it's plausible it wouldn't be something you'd see much under typical operation. After an outage, they'd be dealing with a thundering herd level of traffic as everything tries to reconnect, that'd be very different from normal write loads, even different than the write load they'd have seen when they first enabled caching.