Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you've never had to deal with this stuff before, you probably follow a naive model of memory, wherein all operations are "sequentially consistent": you pick a random thread, and you execute the next memory instruction, and everyone understands that the memory is updated. That model is not how any multicore hardware is actually implemented, because it is slow and often unnecessary.

Now, it is known that, if your program is correctly synchronized, then the existing hardware models are indistinguishable from sequentially consistent execution. This gives rise to the data-race-free model that defines all modern language-level programming models: data races can only exist in the absence of proper synchronization, so we call that behavior undefined [1], and our code goes back to sequential consistency.

The primary issue is that there are two entities playing games with your code to support faster execution: both the compiler and the hardware are making assumptions to speed things up. From the perspective of hardware (i.e., why using "volatile" to stop the compiler from playing games isn't good enough), different processors may have the values in their caches. While cache coherency requires that any given point in time, every processor must agree on the value of every memory location, there is great leeway to reorder loads and stores so that they may execute in different orders than the instruction sequence implies. In the extreme, it is possible that the load of *p may happen before the load of p itself (this is the Alpha memory model, and comes from the existence of uncoordinated cache banks).

[1] Okay, there's a slight untruth here: the C++11 "relaxed atomics" is effectively a data race as could be observed by hardware, but is not undefined in the C++11 sense. This is where it's important to point out that we have been struggling for over a decade to actually come up with a suitably precise definition for relaxed atomics that we're happy with--it is by no means a solved problem.



As a summary to see if I understand you, it sounds like what you're saying is that the CPU will calculate updates to memory, but in the interest of speed and efficiency will not actually write the new value back into RAM for some unspecified period of time?


Both yes and no. If you replace "RAM" in your statement with "cache," then you get a more accurate summary for the situation that gives rise to the issues we're talking about.

But it's also the case that caches avoid writing back into main memory as much as possible, since bandwidth to main memory is quite small. However, cache coherency protocols are used to make all the caches agree on the value, so that committing the value to cache is equivalent to committing it to main memory.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: