I disagree strongly as too strong barriers on non TSO hardware (such as say, armv7 or armv8) will just cause even more of that memory contention for no good reasons.
Also on a contended lock, there's a very strong possibility that you will decide that the lock can be taken right before you block.
It also doesn't matter on all other architectures. The slow path is dominated by OS operations like yielding, which already execute the strongest barriers possible.
Also on a contended lock, there's a very strong possibility that you will decide that the lock can be taken right before you block.
On intel of course, it doesn't matter.