Hacker News

philwelch · on May 5, 2020

Cynical tone aside, there’s a good question here. It’s just that the question has an actual, valid answer: it’s impossible to have a single database that operates at Twitter scale.

If you think it is possible to have a single database that operates at Twitter scale, fine, there’s probably an interesting and enlightening conversation to be had about how and why that is or is not the case.

Continue along this vein and you eventually you get to the point where you’re discussing realistic solutions, and maybe at the end of it you’ve either gained an understanding of how these things work or else you’ve actually come up with a better system design than Twitter. Either way you’ve gained something more valuable than the petty satisfaction of disparaging other people’s motivations.

devonkim · on May 5, 2020

Deletion is a form of cache invalidation if you think about it

flarg · on May 5, 2020

Thank you. And this sort of problem occurs in large organisations with lots of different monoliths all caching each others data.

_y5hn · on May 5, 2020

Records storing transactional facts, are NOT "caching each others data".

flarg · on May 5, 2020

Maybe you're right, but, out of interest, what would call it when half a dozen systems store elements of each others data and then refresh that data on a daily basis?

_y5hn · on May 6, 2020

If tied to an event: facts

If synced and overwritten between systems on daily basis: liability

hinkley · on May 5, 2020

If not the King, at least the Crown Prince of cache invalidation.

d_watt · on May 5, 2020

What if you're twitter? Which this person is.

I agree that if you can keep it simple, it's easier to do it. But sometimes you need distributed services. Saying only Google has that problem is a little reductive.

namanaggarwal · on May 5, 2020

This article is about microservices. If you are not at scale you might not need microservices in the first place.

When data is distributed, one team/service owns user data and other tweets. It becomes not so trivial.

AmericanChopper · on May 5, 2020

This particular article is about microservices, but there's plenty of ordinary business reasons that you may have some sort of asynchronous business process that runs across a distributed set of systems/teams/organisations, that do not relate to scale. I was working on a microservice recently (really it was a service-oriented architecture, but they seem to pretty much mean the same thing now), and it only processed around 10,000 transactions per day. But it almost had to be designed that way, due to the nature of the business processes it was supporting, and the systems it had to interface with.

pfranz · on May 5, 2020

Someone already mentioned cache invalidation. To extend that, I don't think it's all that different from an old paper system. If you want to delete your file it's probably kept in a cabinet in some department, but the billing department or marketing department also has a copy of your name and address in their records. Deleting everything is a multi-step process.

Centralized systems didn't scale in the physical or digital world and it distributed systems complicate things that seem trivial.