For those who have reached vertical database write scaling limits and had to sta...

adventured · on Feb 5, 2020

> the biggest one is db.r5.24xlarge with 48 cores and 768 gb ram. I imagine that can take you quite a long way--perhaps even into millions of users territory

That will run Stackoverflow's db by itself for reference, along with sensible caching (they're very read-heavy and cache like crazy). Here's their hardware for their SQL server for 2016:

2 Dell R720xd Servers featuring: Dual E5-2697v2 Processors (12 cores @2.7–3.5GHz each), 384 GB of RAM (24x 16 GB DIMMs), 1x Intel P3608 4 TB NVMe PCIe SSD (RAID 0, 2 controllers per card), 24x Intel 710 200 GB SATA SSDs (RAID 10), Dual 10 Gbps network (Intel X540/I350 NDC).

https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

bcrosby95 · on Feb 5, 2020

Very far I would guess. 10 years ago we took a single bare metal database server running mysql with 8 cores and 64gb of memory to 8 million daily users. 15k requests per second of per user dynamic pages at peak load.

We did use memcached where we could.

gfodor · on Feb 5, 2020

Yeah 10 years ago you could support millions of users on a high traffic site on a single box. (This was on postgres in my case.) Today, I'd guess at least a 10x increase due to both software optimizations and increased hardware capabilities, if not significantly more.

Truthfully, unless you're working on some kind of non-transactional problem like analytics, even assuming you will need to shard the data or scale out reads ever due to user activity is borderline irrational unless you have extremely robust projections. The database will be the last domino to fall after you've added sufficient caching and software optimization. It's so far down field for most projects (and the incidental complexity cost so high) that my personal bias is that even having the conversation about such things on most projects isn't even worth the opportunity cost vs talking about something else.

Even then, the first thing to fall over will probably be write heavy analytics-like tables that are usually append only due to index write load. Out of the box, you can often 'solve' this by partitioning the table (instead of sharding.) In modern DBs, this is a simple schema change.