I think I don't understand why they're making a critical section at all. The end...

canucker2016 · 2025-03-23T18:33:03 1742754783

> The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.

FTA the InitializeCriticalSectionOnDemand function calls RtlRunOnceExecuteOnce to perform the bookkeeping work of running a function once. RtlRunOnceExecuteOnce is passed a function pointer to do the actual grunt work of running once, in this case, initializing a critical section.

The problem is that RtlRunOnceExecuteOnce EXPECTS the function pointer to return non-zero for success and zero for failure.

InitializeCriticalSectionOnce ALWAYS returns STATUS_SUCCESS (== 0) because it CAN'T FAIL because InitializeCriticalSection can't fail. So InitializeCriticalSectionOnce is telling RtlRunOnceExecuteOnce that it failed when it actually succeeded.

NT kernel devs deal mostly with APIs that return NTSTATUS values, where 0 (== STATUS_SUCCESS) is success and non-zero is a failure.

The problem comes when another thread calls the same code. The RtlRunOnceExecuteOnce API believes that the underlying code failed before, so it tries initializing again, initializing the same critical section again (even if the crit-sect is held by another thread currently).

In goes another thread since the critical section has just been initialized and you've got multiple threads in the same 'protected' code now. Might as well not have a crit-sect.

anarazel · 2025-03-23T13:57:22 1742738242

> Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.

I don't really care about narrower futexes, what I do wish Linux had was 8 byte futexes. It's at times hard to cram enough state into 4 bytes for more complicated things.

E.g. for postgres' buffer state it'd be very useful to be able to have buffer state bits, pin count, shared lockers and the head of the lock wait queue in one futex "protectable" unit.

There are also ABA style issues that would be easier to defend against with a wider futex.

colanderman · 2025-03-23T13:44:12 1742737452

They're calling `(*Callback)(g_myProvider, Context)` in the context of the critical section. So presumably that's also important to run exclusively.

(The tracing handler getting registered twice is just a canary.)

Wumpnot · 2025-03-23T12:47:38 1742734058

SRWLock perf is slightly better than Window WaitOnAddress stuff, and works on older versions of windows.

tialaramex · 2025-03-23T14:31:04 1742740264

I find this unlikely. Do have some real world evidence for this? Microbenchmarks are not at all useful for this stuff in my experience. For speed: In the uncontended case, which is what we mostly care about because if you're contended it's game over for speed, they're both a single atomic CAS, so that's no difference.

In terms of size, the pointer is bigger than you'd want - it's no pthread_mutex or the analogous Windows data structure where it's a multi cache line disaster of a data structure - but it's clearly worse than a futex or WaitOnAddress solution.

Wumpnot · 2025-03-23T14:35:12 1742740512

A few years ago I compared them, it was not a microbenchmark, but a real application. There were a few million(almost entirely uncontended) exclusive locks being taken on startup, SRWLock was consistently faster, though the difference was not large.

tialaramex · 2025-03-23T14:45:12 1742741112

Thanks, I wonder what that's all about.

valicord · 2025-03-23T14:33:31 1742740411

> The end goal is to initialize something no more than once, right?

Took me a while to understand as well. They unregister the handler at the end of the critical section, so the requirement is not "no more then once ever", it's "no more than once at the same time".

dgellow · 2025-03-23T12:41:20 1742733680

What makes SRWLock buggy or over complicated?

tialaramex · 2025-03-23T14:18:27 1742739507

The idea of SRWLock is, as its name might suggest, it's a Simple Reader Writer Lock.

This makes it in theory excellent for implementing this precise feature which you will see in for example C++ (the MSVC implementation is, in fact, just an SRWLock) or Rust

So immediately it's over-complicated for a mutex because we don't need it to handle multiple readers, we're never taking its shareable reader lock, we (as you see in Raymond's example code) just rely entirely on the exclusive lock which is about 10% of its function.

SRWLock is the size of a pointer - because of course in fact it is a pointer, to a lazily allocated structure - with some flag bits squirrelled away at the bottom of the pointer.

The exciting bug is that at some point whoever was implementing this for Windows screwed up while implementing the (necessary for performance) unfair lock stealing strategy.

Brief aside about "lock stealing": If I want to take a lock, and I notice that somebody else just released that lock, I can just take it and we're done - this is unfair because there may be a queue of waiters, and I cut in ahead of them, but it has better performance. This is a simple and common feature for a mutex - fair locks are sometimes useful but definitely should not be the default.

Now, in SRWLock the lock stealing code doesn't remember whether you're trying to take the shared reader lock or the exclusive writer lock. So, it just always steals the exclusive lock - even when that's not what you wanted. As a result in code relying on SRWLock if you take the reader lock and then wait on somebody else to also get that lock (which they should be able to because it's a shared lock) this sometimes deadlocks. Oops.

neerajsi · 2025-03-23T17:34:00 1742751240

I've worked in and around srwlock for a long time. I don't think it's all that reasonable to think you should get "shared starve exclusive" by default, which is what you're asking for. We used to demand that kind of behavior in some windows filesystem code and it's always been a design mistake.

LegionMammal978 · 2025-03-23T21:45:36 1742766336

I don't think SharedStarveExclusive is what GP is talking about? From the description, it sounds like currently, a shared acquire immediately after an exclusive release will get silently 'upgraded' to an exclusive acquire in some cases (and remain exclusive until released), which can easily cause issues. And it can't even be detected or worked around, since the SRWLock functions don't return any information about the status of the lock.

It looks like the Rust standard library migrated away from SRWLOCK on account of this issue [0].

[0] https://github.com/rust-lang/rust/issues/121949

tialaramex · 2025-03-24T13:27:53 1742822873

Yeah, as that Rust issue states, this was first replicated by a C++ programmer and STL (the person, not the Microsoft name for the C++ stdlib) said it looks like a bug in SRWLock.

There's a Microsoft internal ticket which I can't read (but STL can because he's a Microsoft employee) but there's also a GitHub issue for STL (this time the stdlib, a GitHub project, although it was opened by the person) https://github.com/microsoft/STL/issues/4448 and it's confirmed in that issue this is a bug in SRWLock, thus no work for STL† (either the project or the person).

It's not unusual (especially in C++ but this happens anywhere) that people would rather contort themselves to believe crazy things than accept that it's just a bug.

† Both Windows and C++ holds themselves to a very low standard, "Yeah, it's broken, too bad" is acceptable. Once you've determined that it's a bug you're done.

neerajsi · 2025-03-24T14:06:12 1742825172

Wow, thanks for the links. I was previously unaware of this issue. I checked the referenced bug and I see that it is fixed and will be released in an upcoming version of Windows.

Wumpnot · 2025-03-23T14:17:52 1742739472

In some rare cases you can get an exclusive lock when you asked for a shared lock, most code won't care and will still work correctly, but sometimes people do weird shit that they probably should have used a different construct for, and run into this.

colonwqbang · 2025-03-23T22:01:46 1742767306

The point of an RW lock is that it supports multiple concurrent readers. Why is it weird to expect a non exclusive lock operation to be non exclusive?

tialaramex · 2025-03-24T16:34:53 1742834093

It's fair to say that most people can't hit this bug, or rather, they can't hit the grave consequence (a deadlock). To deadlock you need to specifically need shared access when you asked for shared access. A lot of people have code where they want shared access for improved performance, but they don't need it. This bug makes their code a tiny bit slower in some edge cases (where it took the wrong lock), they're not overjoyed about the bug but it has no significant consequences.

neerajsi · 2025-03-24T14:09:27 1742825367

Thanks for the explanation. Sometimes it's worth it to keep weird shit working, but it's hard to predict which subset of weird shit will happen in practice.