I think I don't understand why they're making a critical section at all.
The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.
I also don't understand the use of SRWLock here, or rather, I sort of do but it's a hole in Microsoft's technology stack. SRWLock is a complicated (and as it turns out, buggy, but that's not relevant here) tool, but all we want here is a mutex, so we don't actually need SRWLock except that Microsoft doesn't provide the simple mutex, so this really is what you'd have written at the time† - conjure into existence the over-complicated SRWLock and just don't use most of its functionality.
† Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.
> The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.
FTA the InitializeCriticalSectionOnDemand function calls RtlRunOnceExecuteOnce to perform the bookkeeping work of running a function once. RtlRunOnceExecuteOnce is passed a function pointer to do the actual grunt work of running once, in this case, initializing a critical section.
The problem is that RtlRunOnceExecuteOnce EXPECTS the function pointer to return non-zero for success and zero for failure.
InitializeCriticalSectionOnce ALWAYS returns STATUS_SUCCESS (== 0) because it CAN'T FAIL because InitializeCriticalSection can't fail. So InitializeCriticalSectionOnce is telling RtlRunOnceExecuteOnce that it failed when it actually succeeded.
NT kernel devs deal mostly with APIs that return NTSTATUS values, where 0 (== STATUS_SUCCESS) is success and non-zero is a failure.
The problem comes when another thread calls the same code. The RtlRunOnceExecuteOnce API believes that the underlying code failed before, so it tries initializing again, initializing the same critical section again (even if the crit-sect is held by another thread currently).
In goes another thread since the critical section has just been initialized and you've got multiple threads in the same 'protected' code now. Might as well not have a crit-sect.
> Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.
I don't really care about narrower futexes, what I do wish Linux had was 8 byte futexes. It's at times hard to cram enough state into 4 bytes for more complicated things.
E.g. for postgres' buffer state it'd be very useful to be able to have buffer state bits, pin count, shared lockers and the head of the lock wait queue in one futex "protectable" unit.
There are also ABA style issues that would be easier to defend against with a wider futex.
I find this unlikely. Do have some real world evidence for this? Microbenchmarks are not at all useful for this stuff in my experience. For speed: In the uncontended case, which is what we mostly care about because if you're contended it's game over for speed, they're both a single atomic CAS, so that's no difference.
In terms of size, the pointer is bigger than you'd want - it's no pthread_mutex or the analogous Windows data structure where it's a multi cache line disaster of a data structure - but it's clearly worse than a futex or WaitOnAddress solution.
A few years ago I compared them, it was not a microbenchmark, but a real application. There were a few million(almost entirely uncontended) exclusive locks being taken on startup, SRWLock was consistently faster, though the difference was not large.
> The end goal is to initialize something no more than once, right?
Took me a while to understand as well. They unregister the handler at the end of the critical section, so the requirement is not "no more then once ever", it's "no more than once at the same time".
The idea of SRWLock is, as its name might suggest, it's a Simple Reader Writer Lock.
This makes it in theory excellent for implementing this precise feature which you will see in for example C++ (the MSVC implementation is, in fact, just an SRWLock) or Rust
So immediately it's over-complicated for a mutex because we don't need it to handle multiple readers, we're never taking its shareable reader lock, we (as you see in Raymond's example code) just rely entirely on the exclusive lock which is about 10% of its function.
SRWLock is the size of a pointer - because of course in fact it is a pointer, to a lazily allocated structure - with some flag bits squirrelled away at the bottom of the pointer.
The exciting bug is that at some point whoever was implementing this for Windows screwed up while implementing the (necessary for performance) unfair lock stealing strategy.
Brief aside about "lock stealing": If I want to take a lock, and I notice that somebody else just released that lock, I can just take it and we're done - this is unfair because there may be a queue of waiters, and I cut in ahead of them, but it has better performance. This is a simple and common feature for a mutex - fair locks are sometimes useful but definitely should not be the default.
Now, in SRWLock the lock stealing code doesn't remember whether you're trying to take the shared reader lock or the exclusive writer lock. So, it just always steals the exclusive lock - even when that's not what you wanted. As a result in code relying on SRWLock if you take the reader lock and then wait on somebody else to also get that lock (which they should be able to because it's a shared lock) this sometimes deadlocks. Oops.
I've worked in and around srwlock for a long time. I don't think it's all that reasonable to think you should get "shared starve exclusive" by default, which is what you're asking for. We used to demand that kind of behavior in some windows filesystem code and it's always been a design mistake.
I don't think SharedStarveExclusive is what GP is talking about? From the description, it sounds like currently, a shared acquire immediately after an exclusive release will get silently 'upgraded' to an exclusive acquire in some cases (and remain exclusive until released), which can easily cause issues. And it can't even be detected or worked around, since the SRWLock functions don't return any information about the status of the lock.
It looks like the Rust standard library migrated away from SRWLOCK on account of this issue [0].
Yeah, as that Rust issue states, this was first replicated by a C++ programmer and STL (the person, not the Microsoft name for the C++ stdlib) said it looks like a bug in SRWLock.
There's a Microsoft internal ticket which I can't read (but STL can because he's a Microsoft employee) but there's also a GitHub issue for STL (this time the stdlib, a GitHub project, although it was opened by the person) https://github.com/microsoft/STL/issues/4448 and it's confirmed in that issue this is a bug in SRWLock, thus no work for STL† (either the project or the person).
It's not unusual (especially in C++ but this happens anywhere) that people would rather contort themselves to believe crazy things than accept that it's just a bug.
† Both Windows and C++ holds themselves to a very low standard, "Yeah, it's broken, too bad" is acceptable. Once you've determined that it's a bug you're done.
Wow, thanks for the links. I was previously unaware of this issue. I checked the referenced bug and I see that it is fixed and will be released in an upcoming version of Windows.
In some rare cases you can get an exclusive lock when you asked for a shared lock, most code won't care and will still work correctly, but sometimes people do weird shit that they probably should have used a different construct for, and run into this.
It's fair to say that most people can't hit this bug, or rather, they can't hit the grave consequence (a deadlock). To deadlock you need to specifically need shared access when you asked for shared access. A lot of people have code where they want shared access for improved performance, but they don't need it. This bug makes their code a tiny bit slower in some edge cases (where it took the wrong lock), they're not overjoyed about the bug but it has no significant consequences.
Thanks for the explanation. Sometimes it's worth it to keep weird shit working, but it's hard to predict which subset of weird shit will happen in practice.
The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.
I also don't understand the use of SRWLock here, or rather, I sort of do but it's a hole in Microsoft's technology stack. SRWLock is a complicated (and as it turns out, buggy, but that's not relevant here) tool, but all we want here is a mutex, so we don't actually need SRWLock except that Microsoft doesn't provide the simple mutex, so this really is what you'd have written at the time† - conjure into existence the over-complicated SRWLock and just don't use most of its functionality.
† Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.