redixhumayun's comments

redixhumayun · 2025-09-18T05:18:03 1758172683

> AI has also been a really good brainstorming partner - especially if you prompt it to disable sycophancy.

Can you share more about how you are prompting to disable sycophancy?

redixhumayun · 2025-05-06T20:33:22 1746563602

Has NewSQL really stalled though? Yugabyte, Cockroach & Spanner all seem to be doing fine?

Hyper even got acquired by tableau a few years ago

redixhumayun · 2025-01-29T14:48:13 1738162093

A little trip down performance testing lane as a way to get familiar with the tooling around profiling and performance testing.

Led to a few happy discoveries of compiler optimizations in Rust.

redixhumayun · on Feb 6, 2024

I wrote a little bit about extendible hash tables and put some effort into creating visualisations around them. Mostly because this is something I struggle with when learning about a new topic - how to "see" what's happening

redixhumayun · on Jan 12, 2024

This is awesome! Also, I'm pretty sure I've come across your repo before from a SO answer (if I remember correctly)

redixhumayun · on Jan 12, 2024

Honestly, now that I look back at it having written it a couple of weeks ago, it doesn't feel that long. But, writing it felt incredibly wrong because I was encountering memory models for the first time.

I think for someone who has no exposure to it before, its quite dense (perhaps long wasn't the best choice of wording)

Also, I've been meaning to read The Art of Multiprocessor Programming. I've heard great things about it!

menaerus · on Jan 12, 2024

You shouldn't feel bad about it - the blog is going to have its audience, so don't worry about it. That said, the topic is incredibly complex and to understand it fully requires intimate knowledge of the CPU microarchitectural details and design. So, technically speaking even the links from the comment you're replying to are providing a shallow although somewhat longer introduction. Programming languages only provide an abstraction for these very real things happening in the silicon so that's about as far as they can go by providing the sufficient amount of details. The real meat is down the rabbit hole of the CPU and memory subsystem design and if you want to go there I'd suggest the yt lectures from ETH Zurich on the topic of computer architectures and design (can find the link later).

miloudi · on Jan 12, 2024

are you talking about this one? https://www.youtube.com/watch?v=AJBmIaUneB0&list=PL5Q2soXY2Z...

menaerus · on Jan 13, 2024

Yes, that's the one. These lectures seem to repeat yearly so there're also playlists with the updated content from 2021, 2022 and 2023.

There's also an "advanced" playlist and there're plenty of other interesting lectures about the memory.

redixhumayun · on Jan 12, 2024

That concurrent queue is only there to illustrate usage of CAS in a data structure. I think having an actual implementation of a concurrent queue along with handling the ABA problem might be an entirely separate post.

I added in a note about the ABA problem but perhaps you're seeing a cached version of the post.

redixhumayun · on Jan 12, 2024

Hey, so I'm curious why having memory barriers span across threads is the wrong mental model.

Assuming that the memory barrier is syncing across a single variable (in this case ready), why would it be correct to think of it as two separate barriers? If it were correct to think of it as two separate barriers on two separate threads, wouldn't there need to be some form of synchronization or linkage between the two barriers themselves so that memory barriers can be coupled together?

For instance, if I had release-acquire models on two variables, ready and not_ready, using separate barriers as representation might look something like this

```

  Thread 1                Memory                  Thread 2
  ---------               -------                 ---------
  |                          |                          |
  |   write(data, 100)       |                          |
  | -----------------------> |                          |
  |                          |                          |
  | ====Memory Barrier====== |                          |
  |   store(ready, true)     |                          |
  | -----------------------> |                          |
  | ====Memory Barrier====== |                          |
  |                          |                          |
  | ====Memory Barrier====== |                          |
  |   store(not_ready, true) |                          |
  | -----------------------> |                          |
  | ====Memory Barrier====== |                          |
  |                          |                          |
  |                          | ===Memory Barrier======= |
  |                          |   load(ready) == true    |                   
  |                          | <----------------------  |
  |                          | ====Memory Barrier=====  |
  |                          |                          |
  |                          |.===Memory Barrier======= |
  |                          |   load(not_ready) == true|                   
  |                          | <----------------------  |
  |                          | ====Memory Barrier=====  |
  |                          |                          |
  |                          |       read(data)         |
  |                          | <----------------------  |
  |                          |                          |

```

Now, how does the processor know which memory barriers are linked together? I ask because without understanding which barriers are linked together, how is instruction re-ordering determined?

gpderetta · on Jan 12, 2024

The linking of barriers in pair is really just a mental model, not (usually) what happens at the hardware level. In fact in the C++ memory model the synchronizes-with relationship is load and stores, not barriers, which indirectly affect the properties of load and stores around them. That's another reason why I don't really like the memory barrier model and I prefer to think in terms of happens-before dependency graphs.

edit: AFAIK, seq_cst ordering (as opposed to acq_rel) is only relevant when you have more than two threads and you care about things like IRIW. In this case acquires and releases are not enough to capture the full set of constraints, although at the hardware level it is still everything local.

edit2: I guess the missing bit is that beyond the hardware fences you have the hardware cache coherency protocol that makes sure that a total order of operations always exist once load and stores reach the coherence fabric.

redixhumayun · on Jan 12, 2024

Yeah, I see your point about thinking in terms of dependency graphs. I actually got the idea for using a visual memory barrier from the Linux docs(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...) and the C++ concurrency in action book.

>I guess the missing bit is that beyond the hardware fences you have the hardware cache coherency protocol that makes sure that a total order of operations always exist once load and stores reach the coherence fabric.

Can you explain more about this?

bluetomcat · on Jan 12, 2024

Modern processors are out-of-order execution beasts. A barrier within a thread serves to enforce some ordering within that thread - that a store will occur after another store, and that a load will occur before another load. Threads know nothing of each other.

redixhumayun · on Jan 12, 2024

Are there tools out there that can prove semantic invariants in multi-threaded code? I don't understand how there can be automated tools around it at all because how would that even be possible?

jeffreygoesto · on Jan 12, 2024

TLA+ comes to mind

[0] https://www.learntla.com/index.html

kangda123 · on Jan 12, 2024

There are model checkers such as nidhugg (C++), dscheck (ocaml). They take a test case and reach all possible terminal states by trying different interleavings.

Crucially, they don’t have to try all interleavings to reach all terminal states, making the enumeration quite fast.

maxbond · on Jan 12, 2024

Rust comes to mind.

redixhumayun · on Jan 12, 2024

How would Rust solve this problem?

maxbond · on Jan 12, 2024

All I meant was that it "proves semantic invariants in multi-threaded code," which proves the concept.

kaba0 · on Jan 12, 2024

No data races is just a very tiny subset of semantic invariants, though.

redixhumayun · on Jan 12, 2024

I assumed what the poster above meant was that Rust can take care of more than just data races. Specifically Rust can solve the ABA problem somehow?

maxbond · on Jan 12, 2024

Rust won't solve the ABA problem, no. You'd be in unsafe Rust if you were writing something that could encounter the ABA problem.

You wondered out loud how it was even possible to do that kind of analysis, and that's where my mind went. Evidently people think it's a bad take. That's as deep as it goes.

infamouscow · on Jan 12, 2024

The ABA problem is a false-positive execution of a CAS speculation on a shared memory location.

It is very easy to create an ABA problem in safe Rust. Data race free sequential consistency, which Rust has, is almost completely orthogonal to the ABA problem.

This is an area of active PLT research, we haven't come anywhere close to addressing the problem in the general case.

I suspect we'll be seeing all kinds of bugs caused by a generation of programmers thinking everything has guard rails in Rust because "safety", so they can turn their brain off and not think. In reality, those promises of safety largely disappear when threads, files, signals, and networks are involved.

At the end of the day, your programs run on computers which exist in the physical world. The abstractions are mostly isomorphic, but it's at the margins where the abstractions aren't isomorphic that all the interesting things happen.

maxbond · on Jan 12, 2024

> The ABA problem is a false-positive execution of a CAS speculation on a shared memory location.

In safe Rust, if I have a mutable reference to Foo, and Foo contains a shared reference to Bar, then no other thread has a mutable reference to Foo or Bar. So no other thread will make a CAS on my reference to Bar, or drop Bar and then allocate something at the same memory address, etc.

You could have some higher level ABA problem I suppose, where you acquire a lock, read a value, give up the lock, and then make spurious assumptions about what happens while you've let the lock go. But that's obviously not what we're talking about if we're talking about CAS. (ETA: or if these were application level references, eg indices into a list.)

If we're going to implement a lockfree data structure, we're going to need unsafe Rust to hand-roll interior mutability. Because we're going to be sharing mutable state. Which isn't allowed in safe Rust.

Or am I mistaken?

infamouscow · on Jan 13, 2024

This demonstrates the ABA problem in safe Rust: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Substitute the sleep with a combination of doing computation/work and the OS thread scheduler, and you can see how the bug surfaces.

maxbond · on Jan 13, 2024

I guess? I've only ever heard about the ABA problem in reference to pointers, eg in the context of lockfree queues. Maybe that's my ignorance. (Which is why I addressed shared references in my comment.)

Yes, if you don't hold a lock on a value, or exert some kind of control at the API level (eg making it monotonic so your CAS will work), you can't make assumptions about it. I think you'll find that Rust developers understand that concept about as well as any other community of concurrent developers.

But yes, granted, the semantic information about these integers isn't represented in Rust's type system, and won't be caught by it's static analysis.