If this just got you excited, consider reading Bartosz Milewski's post [0] about it. It's not too hard to understand, but it helps if you understand everything in the OP's posted blog.
The only real issue was portability. On any given system you could do that already before C++11. Now C++ standard hides those implementation details below unified abstraction.
[0]: https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-...