It's difficult for several reasons, and you've identified one of them: people aren't taught how to write code that can take advantage of concurrency. Except…this includes you.
I work as a performance engineer, and a lot of my job is actually undoing concurrency written by people who do it incorrectly and create problems worse than they could've ever have without it. People will farm work out to a bunch of threads, except they'll have the work units be so small that the synchronization overhead is an order of magnitude more than the actual work being done. They'll create a thread pool to execute their work and forget to cap its size, or use an inappropriate spawning heuristic, and cause a thread explosion. They'll struggle mightily to apply concurrency to a problem that doesn't parallelize trivially, due to involved data dependencies, and write complex code with subtle bugs in it.
Writing concurrent code is hard. In general, nobody actually wants concurrency*, it's just a thing we deal with because single-threaded performance has stopped advancing as fast as we'd want it to. As an industry we're slowly getting more familiar with it, and providing better primitives to harness it safely and efficiently, but the overall effort is a whole lot harder than just slapping some sort of concurrent for loop around every problem.
*Except for some very rare exceptions that cannot be time shared
This. As well as actually partitioning the problem. Things like DOM updates tend to be bad for this because the way they're specified requires you to use the result of a previous computation.
No amount of concurrency will save you from memory bandwidth problems, and can quite often make them worse.
> the synchronization overhead is an order of magnitude more than the actual work being done.
Yes, this is a common pitfall of concurrent programming applied incorrectly. I am aware of this. I am also aware that this is something that can be measured in an appropriate benchmarks.
Just like the thread-explosion problem of unchecked worker-pools, it isn't solved, but made alot easier to handle, by baking the capability to map logical execution threads onto OS threads right into the language.
> Writing concurrent code is hard.
Writing really good concurrent code is hard. But not harder than writing good abstractions, or writing performant code, or writing maintainable code.
I work as a performance engineer, and a lot of my job is actually undoing concurrency written by people who do it incorrectly and create problems worse than they could've ever have without it. People will farm work out to a bunch of threads, except they'll have the work units be so small that the synchronization overhead is an order of magnitude more than the actual work being done. They'll create a thread pool to execute their work and forget to cap its size, or use an inappropriate spawning heuristic, and cause a thread explosion. They'll struggle mightily to apply concurrency to a problem that doesn't parallelize trivially, due to involved data dependencies, and write complex code with subtle bugs in it.
Writing concurrent code is hard. In general, nobody actually wants concurrency*, it's just a thing we deal with because single-threaded performance has stopped advancing as fast as we'd want it to. As an industry we're slowly getting more familiar with it, and providing better primitives to harness it safely and efficiently, but the overall effort is a whole lot harder than just slapping some sort of concurrent for loop around every problem.
*Except for some very rare exceptions that cannot be time shared