The article does go into cache coherency which is very much intertwined with multicore parallellism:
> The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.
I feel like we live in a world where everyone works very hard to pretend that C is our best low-level language, when in reality an APL-like purely functional array language would be a better candidate.
Fair enough, though I still think even there the headspace of the author was more in line with proving single-threaded C virtual machine model does not map to how CPU actually behaves, not that a natively parallel language would be best suited to model a contemporary multicore or data parallel processor.
> The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.
I feel like we live in a world where everyone works very hard to pretend that C is our best low-level language, when in reality an APL-like purely functional array language would be a better candidate.