You're not taking into account data, you're only talking about features. What about when the data no longer fits on the one machine? Or processing the data exceeds the capacity of the machine?
Data growth through user growth or just normal day-to-day usage is expected.
If Twitter's data can fit on one machine, then the data of 99.99% of companies can. Not every product needs a billion users with Gigabytes of storage each. The assumption that if your startup's tech isn't scalable enough to become the next Google then it's the wrong tech is hilarious nonsense driven largely by ego fantasies.
Nope. It's not Tweets that generate that data. It's the insane amount of (mostly unnecessary) noise that gets thrown into the mix: analytics, logs, metrics, you name it.
Every time you scroll Twitter sends multiple events to the server. That alone will generate a large chunk of those petabytes.
No, they don't. In spite of the confusing wording in the post you cite, its petabytes/year claim is not derived from the 500m tweets/day claim – it must include metadata and/or multimedia.
This was all already derived (correctly) in the original post. Recapitulating:
Assuming compression and variable-length encoding of this long tail in colder storage, it's more likely <20 TiB/yr (<=115B/tweet on average)
Yes, this excludes analytics metadata, which as you suggest would not support Twitter's current ad products. But your core repeated claim about tweets alone is two orders of magnitude off.
I wonder if the "Petabytes" figure being claimed includes pictures/videos that can be attached to a Tweet. In that case, I could easily see "Petabytes/year" be accurate.
Data growth through user growth or just normal day-to-day usage is expected.