One of projects I was called to help some time ago set up a service with over a ...

benlivengood · on Nov 1, 2021

Something people seem to forget when designing systems based on what they've learned is that modern individual machines have approximately the power of the top supercomputer ~20 years ago, and a few racks of modern machines can in many ways match the top supercomputer of ~10 years ago. Approaches to solve large problems are continually changing and it's almost always worth a big-picture design session when looking at a familiar-seeming problem. By the time students graduate from a 4-year program computing is about 10X better than when they started.

I still remember some interviews within the last decade where single-core machines with a few GB of RAM were a default assumption for whiteboard designs, or that spinning disks were the default.

lmilcin · on Nov 1, 2021

Well, I catch myself that I am not "updating" the state of my understanding of hardware. Just recently I caught myself putting a lot of effort into solving a problem that only exists if persistent storage is too slow to be used for calculations. Then I facepalmed myself hard when I realized that I can just move that entire 500GB data structure to an NVMe and treat it almost as if it was in memory.

But in general I think that the problem isn't that people are not "updating" their understanding. Even 10 years ago it wasn't a huge problem getting hundreds of thousands of transactions per second on a single modest machine.

The problem rather is people relying on more and more layers of abstractions for vary small gains.

Example: I get that Python is a nice language (for somebody that does not know Lisp). But is it worth it to choose Python for a little bit improvement in productivity for a problem that requires a lot of throughput, to then suffer performance issues, to then spend many times more effort on trying to improve performance? I don't think so.

Or more in my space: is Spring Data (Java) worth the very incremental productivity improvements if it completely destroys your application performance? The application I described in my parent post used Spring Data MongoDB which kinda means it was fetching entities one by one which is extremely costly.

By replacing it with bare MongoDB reactive driver and ensuring data is being streamed in large batches (why read one user data if you can read 10k at a time) and getting rid of costly aggregations in favour of application side processing we have improved throughput by many orders of magnitude WHILE reducing load on MongoDB.

Granted, there is a little bit of additional complexity (on the order of 10% more of application code) but just the performance improvements mean that the team can breathe and focus on other problems like modelling the domain correctly.