Simple enough. Except... what if Thread#2 had the following:
while(i<2); // Infinite loop waiting for i to equal 2
foo();
Then in Thread#3:
while(i<3); // Infinite loop waiting for i to become 3
bar();
Then in Thread#4:
while(i<4); // Infinite loop waiting for i to become 4
baz();
Then in Thread#5:
while(i<5); // Infinite loop waiting for i to become 5
foobar();
As we can see here, "i" is a synchronization variable. We only know this fact if we know how another thread works. Now that i no longer steps from 1 to 2 to 3 to 4 to 5, your threads no longer synchronize and the code gains a race condition (all threads might execute at once, since i starts off as 5).
-----------
For better or worse, modern programmers must think about the messages passed between threads. After all, semaphores are often i++ and i-- statements at the lowest level (maybe with a touch of atomic_swap or maybe a lock-statement depending on your architecture).
Modern code must note when a variable is important to inter-thread synchronization, to selectively disable the Compilers optimizer (funny enough: it also is needed to strongly order the L1 cache, as well as the Out-of-order core of modern processors).
As such, proper threading requires a top-to-bottom language-level memory model.
The "knowledge" that the i++ cannot be optimized / combined beyond the sleep statements.
---------
This is no longer an issue on modern platforms. Today, we have C++11's memory model which strongly defines where and when optimizations can occur, with "seq_cst" memory ordering.
There is also a faster, but slightly harder to understand, memory model of acquire and release. This acquire / release memory model is useful on more relaxed systems like ARM / POWER9.
Your mutex_lock() and mutex_unlock() statements will have these memory-barriers which tell the compiler, CPU, and L1 cache to order the code in ways the programmer expects. No optimizations are allowed "over" the mutex_lock() or mutex_unlock() statements, thanks to the memory model.
But back in 2004, before the memory model was formalized, it was impossible to write a truly portable posix-thread implementation. (Fortunately, compilers at the time recognized the issue and solved it in their own ways. Windows had Interlock_Exchange calls, GCC had its own memory model. But the details were non-standard and non-portable).
Compilers are expected to optimize your code, and the primary way to optimize code is through rearranging your statements.
Consider the following:
The compiler will often "rearrange" these ++ statements to all happen on the same line, ultimately as follows: Then Simple enough. Except... what if Thread#2 had the following: Then in Thread#3: Then in Thread#4: Then in Thread#5: As we can see here, "i" is a synchronization variable. We only know this fact if we know how another thread works. Now that i no longer steps from 1 to 2 to 3 to 4 to 5, your threads no longer synchronize and the code gains a race condition (all threads might execute at once, since i starts off as 5).-----------
For better or worse, modern programmers must think about the messages passed between threads. After all, semaphores are often i++ and i-- statements at the lowest level (maybe with a touch of atomic_swap or maybe a lock-statement depending on your architecture).
Modern code must note when a variable is important to inter-thread synchronization, to selectively disable the Compilers optimizer (funny enough: it also is needed to strongly order the L1 cache, as well as the Out-of-order core of modern processors).
As such, proper threading requires a top-to-bottom language-level memory model.
The "knowledge" that the i++ cannot be optimized / combined beyond the sleep statements.
---------
This is no longer an issue on modern platforms. Today, we have C++11's memory model which strongly defines where and when optimizations can occur, with "seq_cst" memory ordering.
There is also a faster, but slightly harder to understand, memory model of acquire and release. This acquire / release memory model is useful on more relaxed systems like ARM / POWER9.
Your mutex_lock() and mutex_unlock() statements will have these memory-barriers which tell the compiler, CPU, and L1 cache to order the code in ways the programmer expects. No optimizations are allowed "over" the mutex_lock() or mutex_unlock() statements, thanks to the memory model.
But back in 2004, before the memory model was formalized, it was impossible to write a truly portable posix-thread implementation. (Fortunately, compilers at the time recognized the issue and solved it in their own ways. Windows had Interlock_Exchange calls, GCC had its own memory model. But the details were non-standard and non-portable).