Aarch64 does indeed have a proper atomic max, but even on x86-64 you can get a wait-free atomic max as long as you only need to support integers up to 64. In that case you can simply do a `lock or` with 1 << i as your maximum. You can even support larger sizes by using multiple registers, e.g. four 64-bit registers for a u8 maximum.
In most cases it's even better to just store a maximum per thread separately and loop over all threads once to compute the current maximum if you really need it.
I take it you never wrote code involving atomic pointers. Regardless of memory usage, a lot of platforms only provide single-word atomics (efficiently), making bitpacking crucial for lockfree algorithms.
Well, years back I released an unstable sort called pdqsort in C++. Then stjepang ported it to the Rust standard library. So at first... nothing. Someone else did it.
A couple years later I was doing my PhD and I spent a lot of time optimizing a stable sort called glidesort. Around the same time Lukas Bergdoll started work on their own and started providing candidate PRs to improve the standard library sort. I reached out to him and we agreed to collaborate instead of compete, and it ended up working out nicely I'd say.
Ultimately I like tinkering with things and making them fast. I actually really like reinventing the wheel, find out why it has the shape that it does, and see if there's anything left to improve.
But it feels a bit sad to do all that work only for it to disappear into the void. It makes me the happiest if people actually use the things I build, and there's no broader path to getting things in people's hands than if it powers the standard library.
Every developer I've talked to has had the same experience with compilation caches as me: they're great. Until one day you waste a couple hours of your time chasing a bug caused by a stale cache. From that point on your trust is shattered, and there's always a little voice in the back of your head when debugging something which says "could this be caused by a stale cache?". And you turn it off again for peace of mind.
What kind of compilation caches, something like ccache[1]? Do you use it, or would you? It is for C and C++. Check out the features, they are pretty neat, IMO!