Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> CRDTs don't _ensure_ ops are commutative (and associative, and idempotent) but rather they _require_ that ops are commutative (and associative, and idempotent)

I disagree. You can create a CRDT flavor of data structure whose ops are not commutative. For example, a Set's add and delete operations. These are not commutative. You cannot switch the order of the ops for meaningfully processing them. However, you can create a CRDT Set. You do that by adding metadata to the ops, and having the instances always process them in the only order that makes sense even if such instances receive the ops in a different order. In that sense, you are "ensuring" ops are behaving like they are commutative and not "requiring" them to be so.

> it's definitely not the case that any data structure is a CRDT

I could have worded my statement better. I meant any data structure that has the aforementioned duality property is a CRDT. Not that any data structure unconditionally can be translated into a CRDT.

> an important "eureka" observation about CRDTs is that the op-based model is theoretical,

I do not understand your statement. Perhaps you could elaborate. My understanding is that a CRDT is op-based or state-based depending on what is "communicated" between the instances. If ops are communicated, then it is op-based CRDT, whereas if states (or delta-states) are communicated, then it is state-based.

At least in that sense, op-based model is NOT theoretical. Perhaps you have a different point in mind that I fail to observe.



> > CRDTs don't _ensure_ ops are commutative (and associative, and idempotent) but rather they _require_ that ops are commutative (and associative, and idempotent)

> I disagree. You can create a CRDT flavor of data structure whose ops are not commutative

i think it's important to reiterate (for other third-party readers) that this is factually incorrect

a data structure with operations that are not commutative (or associative or idempotent) is not a (state-based) crdt by definition


> I disagree. You can create a CRDT flavor of data structure whose ops are not commutative. For example, a Set's add and delete operations. These are not commutative. You cannot switch the order of the ops for meaningfully processing them. However, you can create a CRDT Set.

set add (union) is commutative, set delete is not. so a set with only the add (union) operation is a CRDT, but a set that supports both add and delete is not a CRDT, at least not without very specific caveats.

you can create an add-only CRDT set, or an add-remove CRDT set, or a variety of other CRDT sets that support specific operations with specific caveats. but you can't create a CRDT set that is a fully-fledged set, with all the operations that sets generally provide.

> You do that by adding metadata to the ops, and having the instances always process them in the only order that makes sense even if such instances receive the ops in a different order.

so this is the crux of the issue, I think -- "having the instances always process [ops] in the same order" is basically not possible in any real-world network

first, because it's not possible to decide what the "correct" set of ops actually is, nodes are always subject to partitions, faults, etc. which prevent reliable dissemination of knowledge, and plus all the stuff about light cones and etc.

second, because "the same order" implies a specific total ordering of events is unknowable (see prior comments)

> you are "ensuring" ops are behaving like they are commutative and not "requiring" them to be so

converting a non-commutative operation to a commutative operation in a distributed system requires reliable delivery, which no network provides

the whole point of CRDTs is that they give you a formally strong version of eventual consistency that holds even in the face of (unavoidably) unreliable delivery

> My understanding is that a CRDT is op-based or state-based depending on what is "communicated" between the instances. If ops are communicated, then it is op-based CRDT, whereas if states (or delta-states) are communicated, then it is state-based.

this is true in the abstract, but the issue is that "what is communicated" is not a given, it's subject to the choices you make when you encounter a network fault -- or in the CAP model, a partition

CRDTs are tools for eventually consistent (AP) systems, which means that you have to keep making forward progress if there are partitions, which means that message delivery between nodes is not reliable, it can always fail

for state-based CRDTs if you fail to deliver a message it's fine, the information in that message is not lost forever, it will be included in the next message, and (if the partition is eventually healed) the ultimate state will converge. this is also true for delta states

but for op-based CRDTs if you fail to deliver a message it's not fine, the information in that message is lost forever, it won't be included in the next message, and (even if the partition is eventually healed) the ultimate state will not converge


> "having the instances always process [ops] in the same order" is basically not possible in any real-world network

By having 1) causal order (eg. using what the article refers to as Lamport Causal Clock) and 2) a deterministic sorting function to sort ops that happened concurrently (from the perspective of causal order), we can derive total order.

It’s absolutely possible and used.

And with those two properties, almost any data structure can be turned into a (op-based) CRDT.

That is to say, thoughtlede has it correct in their comments above.


this just isn't true

total order is property that can only exist over a well-defined set of messages

without reliable delivery (and stable network tomography) there is no way to establish a well-defined set of messages

causal order (via lamport clocks or otherwise) just doesn't establish total order (by itself)


We don’t assume “reliable delivery” in AP or eventually consistent systems. We assume “once all messages have been delivered…”

So if you have all messages and the two properties above, a total order can be derived.

You’re correct to say that causal order != total order as such but with the use of correct primitives, like Lamport Causal Clocks, we can get a nice and clean linear order of events :)


"correct primitives" do not by themselves provide a linear order of events

      a
     / \
    b   c
     \ /
      d
b and c are concurrent updates, how do you resolve d?

it's a trick question, you can't resolve d without information loss, unless you bring in additional knowledge from the application

you can resolve d with information loss by defining some heuristic for ordering concurrent updates, a.k.a. last-writer-wins, basically picking one of those updates deterministically

that gets you a total order, but it's cheating: whichever concurrent updates you don't choose are lost, and that violates consistency for any replica(s) that are based on that state

there is no free lunch


> "correct primitives" do not by themselves provide a linear order of events

Review the description of Lamport Causal Clock in the article. Note that it carries “additional info” (additional to the example diagram). This “additional info” is what establishes the structure needed for total order.

> whichever concurrent updates you don't choose are lost, and that violates consistency

They’re not lost! The concurrent updates not chosen are still part of the “list of operations”, but the update to the data/value itself may not be observable if the subsequent update (the update that was sorted to be first) updates the same data/value (eg. both operations update the same key in a key-value structure). If the two operations update different data/value, then both updated values are observable. This isn’t cheating, rather it works exactly as expected: it is eventually consistent.


we are only talking about updates to a specific value here, obviously updates to independent values are trivial to resolve

it's possible to construct a CRDT such that concurrent updates are merged without data loss to a single "list of operations" maintained in the object, but that's not true in general

resolving conflicts with the lww strategy, or variants of that strategy that order concurrent events by e.g. node ID, are indeed eventually consistent at the highest level, but they provide no meaningful consistency guarantees to users, because they allows "committed" writes to be lost


> that's not true in general

Can you elaborate what do you mean by this? I was arguing that it’s possible as the original argument was “this is not possible in a real system and is only theoretical”.

> provide no meaningful consistency guarantees to users, because they allows "committed" writes to be lost

If I set the (shared) value to green and you set it to blue, what is the expected observed value? What if you set it to green and I set it blue, what is the observed value? More importantly, what is the consistency that was lost?


so there are multiple threads of conversation here

first, there is no straightforward way to module "a value" as a CRDT, precisely for the reason you point out: how to resolve conflicts between concurrent updates is not well-defined

concretely, if you set v=green and I set v=blue, then the expected observed value is undefined, without additional information

there are various ways to model values as CRDTs, each has different properties, the simplest way to model a value as a CRDT is with the LWW conflict resolution strategy, but this approach loses information

example: say your v=green is the last writer and wins, then I will observe (in my local replicas) that v=blue for a period of time until replication resolves the conflict, and sets (in my local replicas) v=green. when v changes from blue to green, the whole idea that v ever was blue is lost. after replication resolves the conflict, v was never blue. it was only ever green. but there was a period of time where i observed v as blue, right? that observation was invalid. that's a problem. consistency was lost there.

--

second

> almost any data structure can be turned into a (op-based) CRDT.

yes, in theory. but op-based CRDTs only work if message delivery between nodes is reliable. and no real-world network provides this property (while maintaining availability).


> there was a period of time where i observed v as blue, right? that observation was invalid.

Not invalid. The observation was correct at that time and place, meaning, in the partition that it was observed. This is the P in AP.

It seems to me that you’re talking about and arguing for synchronization, that is, consensus about the values (=state), which takes the system to CP as opposed to AP.

> there is no straightforward way to module "a value"

I would recommend to look into the “Register” (CR)data structure.


there is no single definition of a crdt "register"

i know this because i've implemented many versions of crdt "registers", with different properties, at scale

there are lww registers and multi-value registers, the former provides deterministic (but arbitrary) conflict resolution which loses information, the latter provides deterministic and non-arbitrary conflict resolution without losing information but with a more complex API

> The observation was correct at that time and place, meaning, in the partition that it was observed. This is the P in AP.

this is not what partition means

partition means that different subsets of nodes in a single system have different views on reality

if v=blue was true only during a partition, and is no longer represented in the history of v when the partition heals, then this value is not legal, violates serializability, is incorrect, etc.

https://jepsen.io/consistency#consistency-models

this has nothing to do with synchronization


> there are lww registers and multi-value registers

Use a single value register then in place where I said “register”.

> partition means that different subsets of nodes in a single system have different views on reality

Partition means that nodes of a (single) network are (temporarily) not connected.

In the example discussed here, blue and green were in different partitions thus had “different view” to the state. Once synchronized, their view on the state is consistent and both observe both values, blue being the latest state.

> is no longer represented in the history of v when the partition heals

Please review the above discussion again and the original article. You keep saying this but it’s shown in both that it’s is not true.


"last writer wins" does not preserve state from losing (earlier) writers/writes

after synchronization, all nodes share a consistent view of state (yes) but that state has no history, it only contains the net final value (blue) as the singular and true state

this is not complicated, i think you're out of your element


> but for op-based CRDTs if you fail to deliver a message it's not fine, the information in that message is lost forever, it won't be included in the next message, and (even if the partition is eventually healed) the ultimate state will not converge

Can't you just have every node keep a history of ops, and when nodes communicate with each other they can compare clocks to know which ops to re-deliver? We should also be able to enforce idempotency this way.


in theory yes, in practice no

basically this makes peer connections stateful, but maintaining that state accurately is very difficult, especially when considering tomography changes in the system

in fact if you can manage that state correctly, you've solved a problem that's roughly the same as the problem that CRDTs solve

(in other words, you're almost certainly not gonna solve that problem correctly)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: