That reminds me of the entertaining "I just want to serve 5 terabytes. Why is this so difficult?" video that someone made inside Google. It satirizes the difficulty of getting things done at production scale.
Nothing in that video is about scale. Or the difficulty of serving 5TB. It's about the difficulty of implementing n+1 redundancy with graceful failover inside cloud providers.
User: "I want to serve 5TB."
Guru: "Throw it in a GKE PV and put nginx in front of it."
Congratulations, you are already serving 5TB at production scale.
The interesting thing is there also paradoxes of large scale: things that get more difficult with increasing size.
Medium- and smaller-scale can often be more flexible because they don't have to incur the pain of nonuniformity as scale increases. While they may not be able to afford optimizations or discounts with larger, standardized purchases, they can provide more personalized services large scale cannot hope to provide.
On a related note, providers that have independent instances for each customer (so no multi-tenancy) typically get about 3 more nines than, say, AWS. On prem enterprise is a typical example of this, and it is still used in safety critical industries for this reason.
Eventually, all outages are black swan events. If you have 1000 independent instances (i.e., 1000 customers), when the unexpected thing hits, you’re still 99.9% available during the time when the impacted instance is down.
Also, you can probably permanently prevent the black swan from hitting again before it hits again.
https://www.youtube.com/watch?v=3t6L-FlfeaI