Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a clear and crisp way of differentiating delta based vs state based CRDTs.

But I don't think it's accurate enough. For example, the article claims delta based CRDTs don't use vector clocks. But even to decide if a set of events are concurrent it's necessary to attach vector stamps to every event. Maybe the author meant something else when he says vector clocks are not needed for op based CRDTs.

Also as mentioned in article it's true evergrowing event log is not such a bad idea with compression techniques and cheaper disks. But the problem with evergrowing datastructures is not just disk space. This data has to be loaded onto main memory to do anything useful with it. This data has to be transmitted across network to power SaaS apps. So stating that disks are cheaper hence evergrowing datastructures are fine - is an oversimplification.



I don't recall if it was git itself or github that bragged about inverting the storage structure so that it was simplest to read the current state, and HEAD~3 was determined by substracting the last 3 patches rather than fast forwarding from root to length - 3.

To do so requires that the patch function is a reversible function, which for git is not hard to achieve. I wonder how many CRDTs that is true for.

There are ways to transmit only a partial history for git, there should be ways to do the same for most CRDTs.


There absolutely is. I’m writing a paper at the moment about how we’ve done this in diamond types. The core idea is to store the original operations and the versions. And then reconstruct the crdt state on each peer as needed. Because most editing histories are mostly linear, it usually ends up faster than CRDTs in practice anyway. But you can then just send the latest document state and send historical operations on demand for merging and to access old document states.

I’ll post the paper when it’s ready. I think it’s a great approach - file sizes end up smaller. You don’t need the crdt in memory during editing and you can prune old operations.


Oh that's clever


Yeah network bandwidth scaling with participant count is the the main headwind i.e. it's quadratic in most CRDTs at the moment AFAIKT




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: