One of projects I was called to help some time ago set up a service with over a thousand servers. They reached out to help improve the performance because in their opinion the network and network filesystem were too slow.
They invested incredible amount of time in learning technology and scaling their application but forgot about need to learn fundamentals -- data structures and efficiency.
Fast forward 1,5 years, the application as I left it ran on a single server using about 10% of its capacity. A second server is just a hot standby backup.
The service went literally from being able to process tens of transactions to a hundred thousand transactions per second on a single node.
More than that, we threw away most of the exotic technology that was used there -- greatly improving team productivity. The implementation is simpler than ever with layers upon layers of microservices replaced with regular method calls and a lot of infrastructure basically removed without a need to replace it with anything.
People who do not learn from history (fundamentals) are doomed to repeat the same mistakes.
Something people seem to forget when designing systems based on what they've learned is that modern individual machines have approximately the power of the top supercomputer ~20 years ago, and a few racks of modern machines can in many ways match the top supercomputer of ~10 years ago. Approaches to solve large problems are continually changing and it's almost always worth a big-picture design session when looking at a familiar-seeming problem. By the time students graduate from a 4-year program computing is about 10X better than when they started.
I still remember some interviews within the last decade where single-core machines with a few GB of RAM were a default assumption for whiteboard designs, or that spinning disks were the default.
Well, I catch myself that I am not "updating" the state of my understanding of hardware. Just recently I caught myself putting a lot of effort into solving a problem that only exists if persistent storage is too slow to be used for calculations. Then I facepalmed myself hard when I realized that I can just move that entire 500GB data structure to an NVMe and treat it almost as if it was in memory.
But in general I think that the problem isn't that people are not "updating" their understanding. Even 10 years ago it wasn't a huge problem getting hundreds of thousands of transactions per second on a single modest machine.
The problem rather is people relying on more and more layers of abstractions for vary small gains.
Example: I get that Python is a nice language (for somebody that does not know Lisp). But is it worth it to choose Python for a little bit improvement in productivity for a problem that requires a lot of throughput, to then suffer performance issues, to then spend many times more effort on trying to improve performance? I don't think so.
Or more in my space: is Spring Data (Java) worth the very incremental productivity improvements if it completely destroys your application performance? The application I described in my parent post used Spring Data MongoDB which kinda means it was fetching entities one by one which is extremely costly.
By replacing it with bare MongoDB reactive driver and ensuring data is being streamed in large batches (why read one user data if you can read 10k at a time) and getting rid of costly aggregations in favour of application side processing we have improved throughput by many orders of magnitude WHILE reducing load on MongoDB.
Granted, there is a little bit of additional complexity (on the order of 10% more of application code) but just the performance improvements mean that the team can breathe and focus on other problems like modelling the domain correctly.
They invested incredible amount of time in learning technology and scaling their application but forgot about need to learn fundamentals -- data structures and efficiency.
Fast forward 1,5 years, the application as I left it ran on a single server using about 10% of its capacity. A second server is just a hot standby backup.
The service went literally from being able to process tens of transactions to a hundred thousand transactions per second on a single node.
More than that, we threw away most of the exotic technology that was used there -- greatly improving team productivity. The implementation is simpler than ever with layers upon layers of microservices replaced with regular method calls and a lot of infrastructure basically removed without a need to replace it with anything.
People who do not learn from history (fundamentals) are doomed to repeat the same mistakes.