The details are hazy for me but all relevant CPUs have coherent caches, but not all make the same ordering guarantees.
x86 has "total store ordering", meaning stores made by core 1 will always be observed in-order by core 2. ARM doesn't make that guarantee.
In practice it doesn't matter for writing correct programs unless you write assembly: even if the CPU has total store ordering, the compiler is allowed to reorder stores unless you put an appropriate barrier in the high-level language source.
x86 has "total store ordering", meaning stores made by core 1 will always be observed in-order by core 2. ARM doesn't make that guarantee.
In practice it doesn't matter for writing correct programs unless you write assembly: even if the CPU has total store ordering, the compiler is allowed to reorder stores unless you put an appropriate barrier in the high-level language source.