> It also consumes larges amounts of memory and in the very unlikely worst case, may require 2x MAX_THREADS replicas of the sequential object.
Depending on how many threads are involved, I have a simple construction [0] that is likely to scale much better. It uses multiple copies of the data structure, but each successive copy is used only half as often, resulting in logarithmic amortized complexity.
IMO these types of constructions only make sense when reads are far more common than writes. Using this for an std::set is ridiculous — if you really want to apply one of these constructions to a mapping, you should build a whole new optimized seat structure for each write. Or maybe you could cleverly use immutable data structures to improve memory and bandwidth usage.
Yeah I remember it. Basically, compare-and-swap to enqueue an operation. In contention threads will cooperate to help threads finish operations. With this you bound the blocking and achieve wait-free concurrent operation.
(BTW, in practice, Clojure atoms were initially thought of as the simple case of refs, where you have a full STM behind you. In practice, atoms end up being "good enough" in nearly all cases....)
I think (?) the difference between this and an atomic reference (of which a Clojure atom is a specific example) is that it is geared towards mutable data structures rather than immutable ones. Although as far as I can tell it's more or less emulating immutable data structures by copying the mutable data structure before modification.
As for the ultimate lack of widespread use of STM in real-world Clojure code, I think in addition to the nice interface provided by atoms, it's also in part because Clojure's STM lacks a couple of combinators that limit its composability, and so a lot of other concurrency approaches make more sense. See the combinators provided by https://github.com/skejserjensen/eclojure and http://www.thattommyhall.com/2013/02/15/stm-clj-vs-haskell/
Is "Universal Construction" a well-known term? I don't recall ever encountering it in my Comp. Sci. degree, and some quick searching only leads me to more academic papers.
I only know them from category theory and abstract algebra. You could think of them as polymorphic constructions from some set of assumptions. As an example, given some type of elements, the type of lists of those elements is a monoid (the free monoid) on those elements.
> How would this compare with RCU techniques, as used in Linux? Is it about getting finer-grained access without compromising speed?
I'm wondering this, too. To me it sounds the same. From the "bad news" section:
> CX serializes all mutative operations which means it's flat at best if the workload consists of only mutations on the sequential object. ... It also consumes larges amounts of memory and in the very unlikely worst case, may require 2x MAX_THREADS replicas of the sequential object.
That's roughly the same text I'd expect to see on an RCU implementation. (Maybe the exact number of replicas differs; I think with some RCU schemes it's GCed asynchronously and is theoretically unbounded.) At first glance, it's hard to understand what's new here. They say they used this to make "the world's first wait-free balanced binary tree". How does that description apply to this but not to RCU?
...but that's not wait free? What happens if a processes crashes or a thread is de-scheduled while holding the lock? Or just isn't scheduled again very quickly?
Glancing at OP, it appears CX itself uses a RW lock.
"Several years ago, when Andreia first came up with the idea that we could use a reader-writer lock to achieve wait-freedom, I though that was crazy, but cool. After joining up forces with Pascal and a lot of work, we were able to turn that into a universal construction, CX."
I guess they must be using the lock in a careful way to avoid any contention, or to distribute people who need the lock across many versions of a data structure.
That's wouldn't be the same as a 'simple mutex', would it? Which is absolutely not wait-free (even if I don't understand how the OP is using locks.)
Yes - a lock can redirect people trying to acquire it, rather than queueing them. So processes need a lock to access a resource, but there are enough resources for everyone to get their own without waiting, and enough that if a process crashes holding one it doesn't cause anyone else to wait. That's not a 'simple mutex' though.
Depending on how many threads are involved, I have a simple construction [0] that is likely to scale much better. It uses multiple copies of the data structure, but each successive copy is used only half as often, resulting in logarithmic amortized complexity.
IMO these types of constructions only make sense when reads are far more common than writes. Using this for an std::set is ridiculous — if you really want to apply one of these constructions to a mapping, you should build a whole new optimized seat structure for each write. Or maybe you could cleverly use immutable data structures to improve memory and bandwidth usage.
[0] https://link.springer.com/chapter/10.1007%2F978-3-540-92221-...