In general, parallelization was a messy kludge in older languages originally int...

In general, parallelization was a messy kludge in older languages originally intended for single CPU machine contexts. Additionally, many modern languages inherited the same old library ecosystem issues with Simplified Wrapper and Interface Generator template code (Julia also offers similar support).

Only a few like Go ecosystem developers tended to take the time to refactor many useful core tools into clean parallelized versions in the native ecosystem, and to a lesser extent Julia devs seem to focus on similar goals due to the inherent ease of doing this correctly.

When one compares the complexity of a broadcast operator version of some function in Julia, and the amount of effort needed to achieve similar results in pure C/C++... the answer of where the efficiency gains arise should be self evident.

One could always embed a Julia programs inside a c wrapper if it makes you happier. =)

https://www.youtube.com/watch?v=GZEOb6p1yvU