Except you really do need to care about this complexity on CPUs. Things like cache locality & predictable access patterns are critical to achieving good CPU performance. This is why there's things like data-oriented design, SoA vs. AoS, and Z-order curves. It's also why linked-lists are so incredibly awful in practice, despite having superb algorithmic performance in theory.
A big reason programming for CPUs doesn't seem as complex is because the vast, vast majority of time nobody actually cares about CPU performance. We all just prefer to pretend a runtime or JIT or compiler managed to magically make a language that's god-awful horrendous on modern CPUs run fast. They didn't, we just all look the other way though.
The difference between CPUs & GPUs is when people reach for GPUs, such as for games or HPC, those are also the people that care a lot about performance. And guides like this are for them.
I don't think we're fundamentally in disagreement. But I will say that there's a huge amount of value in CPUs not forcing you to care about the complexity when it doesn't matter, and it doesn't always matter.
A big reason programming for CPUs doesn't seem as complex is because the vast, vast majority of time nobody actually cares about CPU performance. We all just prefer to pretend a runtime or JIT or compiler managed to magically make a language that's god-awful horrendous on modern CPUs run fast. They didn't, we just all look the other way though.
The difference between CPUs & GPUs is when people reach for GPUs, such as for games or HPC, those are also the people that care a lot about performance. And guides like this are for them.