And what about that has anything to do with C specifically? Every useful programming language requires cause precede effect, and every architecture that allows load-store reordering has memory barrier instructions. Specifically, where would code written in C require the compiler to generate one of these instructions, where code hand-written for the process's native instruction set would not?
It matches C's semantics exactly, to the point where ARM chose a specific acquire/release to match the "sequential consistency for data-race-free programs" model without requiring any global barriers or unnecessarily strong guarantees, while still allowing reordering.
(I should note that I believe this is actually C++'s memory model that C is using as well, and perhaps some other languages have adopted it too.)