Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'd love a quick perspective on how CRDTs integrate into real app infrastructure in the real world.

There just aren't a lot of good integrations with databases yet. There's nothing technical stopping that from happening - but I don't think we even know CRDTs could be fast enough to be practically useful until last year. So its going to take a few years before this stuff bleeds into databases.

A list CRDT like diamond types or Automerge is just a set of changes. Changes are like git commits - they have an ID, parents, and details of the change. Its like if you wanted to store a git commit per edit / per keystroke a user makes. The algorithms / mathematical guarantees don't care how you store and send that data set.

Aside from some clever algorithms, an (operation based) CRDT is essentially an append-only set of operations, that needs to be replicated between peers. There's nothing magic about them, and any infrastructure which can do that can support CRDTs.

So you could just store each change as an entry in a database table - and actually, that'd work fine in practice[1]. The problem is that you'd end up using up a lot of space for a document, because there are so many changes. Type a 10 page paper? That'll be 260k database entries! There are plenty of tricks to storing this information compactly. Simple RLE encoding decreases size by an order of magnitude. (Thanks to Martin Kleppmann for this work). But lots of compact binary representation tricks don't play that nice with modern databases. And you don't want to have to load & re-save the data each time you change something.

One long term answer would be to bake CRDT compaction techniques into databases themselves, but obviously this will be idiosyncratic. And there's lots of shorter term solutions, like using multiple tables + run-length encoding values. But I think the CRDT libraries themselves (Yjs, Automerge, Diamond Types, etc) need to mature some more. I don't see much point hacking up postgres when the set of fields & semantics are still (somewhat) in flux. (Well, yjs might be more ready for this work - but its probably still a good idea to wait until yjs's rust port (yrs) is done).

[1] It'd work fine the way my own CRDT (diamond types) is designed, since DT just needs to store the "origin positions" of each change. In Yjs and Automerge, changes store references to the ID of the item to the immediate left. In order to generate those changes, both automerge and yjs need to look up those persistent IDs - which in practice means they need to load the document state into memory first.



I believe that Redis supports CRDTs (https://redis.com/blog/diving-into-crdts/) for its active/active geo-replication, and they're supposed to be pretty fast.

...but it's important to note that CRDTs as used by Redis (and as defined by, for example, https://arxiv.org/pdf/1806.10254.pdf) really just means that you've defined a way to consistently resolve conflicts - it doesn't necessarily mean the all-singing-all-dancing collaborative text editing that a lot of people seem to be assuming. For example, as defined in the paper above, simple "last write wins" is a CRDT.


> I don't think we even know CRDTs could be fast enough to be practically useful until last year.

What happened last year?


Well, I should rephrase: I didn't know CRDTs could be fast enough to be practical. At least in the context of collaborative text editing. I've been in the field I was arguing against using CRDTs for over a decade.

I ran a bunch of optimization experiments 18 months ago[1] and realised I was wrong. I demonstrated (at least to my own satisfaction) that all the problems that CRDTs had (memory usage, disk usage, CPU usage) seemed to be finally solvable. Last year I put out a blog post[2] talking about how to get insane performance. Both of these posts were discussed in detail here on HN.

One result of this is that automerge (an excellent CRDT) has started optimizing for performance. I hear their performance on this benchmark has improved from 5 minutes to ~2 seconds. Yjs can do the same thing in 1 second. Diamond types (my own CRDT) is now about 6 times faster than it was when I wrote that post last year. I can run the same benchmark in ~10ms (0.01 second). Memory usage has dropped from automerge's 880mb, to 2mb in diamond types. And I think we can get it much lower. Disk size for the same data set has dropped from 4mb of JSON down to 100kb while still storing the full edit history. (And this is for a 100kb document!)

There's a lot of tricks involved, and plenty more in the tank. But in short, modern list / string CRDTs are now close to the speed of their equivalent native data structures. With a text CRDT in rust, more CPU gets wasted in wasm overhead than is spent maintaining the CRDT itself.

Thats not something we knew was even possible a year or two ago. So, thats whats changed.

[1] https://josephg.com/blog/crdts-are-the-future/

[2] https://josephg.com/blog/crdts-go-brrr/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: