Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're not taking into account data, you're only talking about features. What about when the data no longer fits on the one machine? Or processing the data exceeds the capacity of the machine?

Data growth through user growth or just normal day-to-day usage is expected.



If Twitter's data can fit on one machine, then the data of 99.99% of companies can. Not every product needs a billion users with Gigabytes of storage each. The assumption that if your startup's tech isn't scalable enough to become the next Google then it's the wrong tech is hilarious nonsense driven largely by ego fantasies.


It does not fit on one machine. Tweets alone generate petabytes of data a year, and other events are petabytes per day.

https://blog.twitter.com/engineering/en_us/topics/infrastruc...

https://ankush-chavan.medium.com/twitter-data-storage-and-pr...


> Tweets alone generate petabytes of data a year

Nope. It's not Tweets that generate that data. It's the insane amount of (mostly unnecessary) noise that gets thrown into the mix: analytics, logs, metrics, you name it.

Every time you scroll Twitter sends multiple events to the server. That alone will generate a large chunk of those petabytes.


No, that's the second link - generated data, separate from tweets.

Tweets alone generate petabytes of data a year.

https://ankush-chavan.medium.com/twitter-data-storage-and-pr...

Also, many people would disagree that stuff required to run a business is "mostly unnecessary".


No, they don't. In spite of the confusing wording in the post you cite, its petabytes/year claim is not derived from the 500m tweets/day claim – it must include metadata and/or multimedia.

This was all already derived (correctly) in the original post. Recapitulating:

500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

Assuming compression and variable-length encoding of this long tail in colder storage, it's more likely <20 TiB/yr (<=115B/tweet on average)

Yes, this excludes analytics metadata, which as you suggest would not support Twitter's current ad products. But your core repeated claim about tweets alone is two orders of magnitude off.


> 500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

I wonder if the "Petabytes" figure being claimed includes pictures/videos that can be attached to a Tweet. In that case, I could easily see "Petabytes/year" be accurate.


Twitters data cannot fit on one machine. In 2015 their Hadoop cluster was 30 PB per earlier comments/their blog. How do you fit that on one machine?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: