This is available only under certain architectures (x86 is one of them as you've...

YZF · on May 3, 2015

Mutexes are typically implemented over atomic instructions. So you'd do something like atomic compare/exchange to acquire the mutex and if there's no contention you got it. If there is contention you go to the OSes synchronization constructs which are typically much slower... An atomic increment should always be faster than acquiring a mutex and incrementing...

hurin · on May 3, 2015

How could it be slower than a mutex? Is it from compiling all your increments to use LOCK? What exactly were you timing? That sounds really strange..