OT and CRDT trade-offs for Real-Time collaboration

migueloller · on Jan 14, 2020

I’ve been doing a pretty deep dive on CRDTs and OTs to get the right user experience for our collaborative website builder[1]. I’m still not done with the implementation and subsequent testing (i.e., splitting text nodes, as mentioned in the post). But the core algorithm behind Automerge[2], formalized in the OpSets paper[3] is extremely promising. A good example of its power is the ability to do an atomic tree “move”, whilst preserving strong eventual consistency.

I'm so glad to see that the HN community cares about this problem space. Maybe I should finally write that blog post on CRDTs and how to implement them yourself following the OpSets spec.

[1] https://www.makeswift.com

[2] https://www.github.com/automerge/automerge

[3] https://arxiv.org/abs/1805.04263

sipjca · on Jan 14, 2020

I think that blog post would be a great idea, would love to read it

migueloller · on Jan 14, 2020

Really? I’ll get something out this week. My Twitter handle is in my profile. I’ll probably tweet about it when I write it.

phsource · on Jan 14, 2020

Wanderlog (https://wanderlog.com) is a Google Docs for planning travel, and naturally, we had to figure this out early in the process.

If you're on a Node.js/React stack, we highly recommend using the combination of OT-JSON0 [1] and ShareDB [2], two excellent libraries. OT-JSON0 lets you perform operational transforms on any JSON-serializable structure pretty intuitively, and ShareDB handles synchronizing it between multiple clients and resolving conflicts. Building your own is going to be a pain, so unless your core business is a text editor like TinyMCE, it's probably best to stand on the shoulders of giants.

Another approach that we didn't end up using is Figma's: especially if you're working on anything other than text editing, ask if you actually want to use operational transform. Something simpler might be good enough -- in Figma's case, they use "last write wins" because they were able to break the document into small enough, atomic elements. [3]

[1] https://www.npmjs.com/package/ot-json0

[2] https://www.npmjs.com/package/sharedb

[3] https://www.figma.com/blog/how-figmas-multiplayer-technology...

migueloller · on Jan 14, 2020

We’re actually using ShareDB at Makeswift right now and for us it has become a horrible burden. There are very subtle bugs on their data persistence layer with concurrency and the API is a bit of a mess to work with in my opinion. I’m glad to see other people are having success with it, though.

spyder81 · on Jan 14, 2020

I've been given advice that ShareDB is "pretty antiquated and not well supported". I'm told it is not difficult to rewrite in modern NodeJS, replacing things like callbacks and event emitters with async/await produces a much simpler and more reliable server.

phsource · on Jan 14, 2020

Ah that's unfortunate; we did find it a bit weird that you have to think of all changes as `ops`, but otherwise, I'm curious as to what you've had issues with.

We also have a custom back-end that's MySQL that may have changed how it work a bit

alangibson · on Jan 14, 2020

I came to the same "LWW on fragments" solution for peraspera.io. In a lot of cases, all OT gets you is being able to watch someone else's cursor move around. The new wore off on that a long time ago.

aloer · on Jan 14, 2020

Could you elaborate a bit on that?

spyder81 · on Jan 14, 2020

I'll cover this in the second post but we will be using ShareDB for our initial launch (although probably building our own once we hit production release).

spyder81 · on Jan 15, 2020

atomic elements doesn't really extend to arbitrary HTML, which is fine for the simpler editing solutions but not TinyMCE.

To do something like that in arbitrary HTML you wind up locking at the top-level-block boundaries. Locking sounds great as an engineer but it leads to a garbage user experience.

raphlinus · on Jan 14, 2020

There's one thing I'd add here, as advice to potential implementers of collaborative editing algorithms. Test the system rigorously, autogenerating concurrent workloads of all the operations (including undo), and also various delays. If this had been done early on, all the broken algorithms (some published with formal proofs of correctness!) would have been caught. A lot of the broken algorithms work under light loads - TP2 requires at least three concurrent edits, two inserts and one delete, to trigger.

spyder81 · on Jan 14, 2020

We're using property-based testing where we can, which I hope to cover in a future post

gvkhna · on Jan 14, 2020

At Mixcut.com (Building real time collaborative video editing) we use a combination of OTs and a variant/simpler version of CRDTs that provide the most coverage of the various features needed. The usage of low latency websockets to shuttle OTs and building completely around OTs is key.

It’s working great for us in beta, we’ve recently shipped multiplayer to our users and overall it’s very stable. With a lot more in store soon, please check it out!

phedkvist · on Jan 14, 2020

Thanks for linking to my blog post :)

"This is the exact "split node" scenario I described earlier; applying bold near the start of a text node necessarily has to split the text node into three parts (before, bold, and after)."

Not sure why you split it up into sections. In my CRDT implementation, I would add meta-data to each character, with the boolean property which is either bold or not. It's certainly cumbersome to keep the cursor at the right place when inserts are being made, but it's doable. https://pierrehedkvist.com/posts/1-creating-a-collaborative-...

I personally never understood how OT actually works, clearly, Google Docs and others find it useful. But to me, CRDT has more solid proof and reasoning behind it, and it is easier to comprehend. https://medium.com/@pierrehedkvist/creating-a-collaborative-...

spyder81 · on Jan 14, 2020

I definitely read a few of your posts during my research, thanks for your hard work :)

The best CRDTs do treat each character separately (as I think I mentioned at one point?). But not all of them - the video and my example is an example of one that doesn't.

It's also very difficult to support arbitrary nested trees with a per-character data type, which leads to the compromises I talked about.

strictfp · on Jan 14, 2020

Is the article suggesting to apply a string CRDT to an entire JSON structure? Are people doing that?

At the very least, I would expect different JSON data types to use different CRDTs. For a collaborative model, I would expect the semantics of the data model to be reflected in the choice of CRDTs for different substructures in the JSON.

ergl · on Jan 14, 2020

No, the state of the art CRDT solution for json is something like Automerge (https://github.com/automerge/automerge), that treats the document like a collection/combination of CRDTs of different types. (the author doesn't mention it explicitly, but it's linked at the end)

codebeaker · on Jan 14, 2020

I have been using Automerge recently for a project, and I have found it to be very, very user-friendly.

Our use-case is offline-editing of documents with an eventual sync-with-yourself-online. It's mostly there as a sync tool, not as a p2p colalb editing. Unless you count yourself as a peer, I guess!

We store a document in local storage which is the result of `Automerge.save(automergedoc) => serializable string`. That can be loaded with `Automerge.load(s) => doc`.

"Changes" are a first-class concept in automerge (makes sense, I guess) so you can do

   before = Automerge.from({..some doc..})
   after = Automerge.change(before, "optional commit msg", (doc) => doc.hello = "world);

With that done, you can get a diff, `Automerge.getChanges(before, after) => [changes]`.

Those `[changes]` are what we try and send to a server and keep them in a `Set()` for each document (background sync, service worker, etc make a nice experience.

This is all wrapped up in Redux, and a middleware that captures the changes, and a reduxStore subscriber that puts the document back into local storage after any changes.

It is for the application programmer virtually transparent, and so far we've yet to find a fault with it.

I'm sure I'll come to regret making "changes" my principle data type, as the server implementation just sees serialized blobs of data, and has no concept of the structure of the document. That's solveable if I need it ever, because there's a protcol compatible implementation in Rust, and I could always use the JavaScript one on the server too, just haven't needed yet.

The serialization is also quite large, in the order of kilobytes for what are actually quite small docs, but they do contain the original commit messages too, and the Automerge team is planning to release an optimized serialization format early this year, which they say will be especially useful for fields with lots of changes, such as text fields, which currently pay a heavy toll in terms of tracking `[changes]`. Fortunately that also doesn't affect me.

I can also +1 some of the advice to keep your whole document in a CRDT. We use React+Redux, but since React Hooks made stateless functional components a reality, we felt it was appropriate to use React exclusively for the UI state, and track only document states in Redux reducers. It feels light-weight and sustainable, although we're only about a month and 5k LOC into the project. Time will tell I guess.

ergl · on Jan 14, 2020

Yeah, personally the feature I'm looking forward to is materializing "changes" into a document without losing editing history.

At some point your set of changes grows too large or you want to trim it to, say, a few weeks or months. Afaik that isn't possible yet without losing the ability to merging later on.

eudoxus · on Jan 14, 2020

I'm farmilliar with Automerge, but not its specific features on state compression. In general change set compression is possible with CRDTs but there are some "gotchas" you need to look out for. Most notably, all peers interested in a changeset, have to be completely in sync at the moment of compression, otherwise, depending on which underlying CRDT type you use, there can end up with duplicate entries in for example a Set type. Depending on the use case, 100% peer sync is possible, in others its more challenging.

spyder81 · on Jan 14, 2020

I'm a big fan of Automerge, I look forward to where the research takes it!

jiofih · on Jan 14, 2020

So.. we’re left with no real explanation to why OT is better than CRDT other than a 5s video showing a poorly managed text node update that moves the cursor. I feel tricked.

slau · on Jan 14, 2020

Really?

> In OT, every user action is broken down into one or more operations. These operations are transmitted between clients along with their baseline reference; if two users perform actions at the same time, incoming operations must be transformed to include the local operations that have happened since that baseline. They are then applied locally and form the new baseline.

> This constant transformation of operations turned out to have too many edge cases where clients were found to not produce the same baseline (the "wrong" papers above). When that happens, the clients will never converge on the same result and break the fundamental assumption of collaboration.

And that’s exactly right. It’s fairly easy to have an OT system that is very vulnerable, because clients can cheaply generate change sets that are extremely computationally expensive on the server side. I’ve seen a system where a single mobile phone could, in a few seconds, lock up the synchronisation server for days.

jiofih · on Jan 14, 2020

But the article is arguing in favor of OT, not CRDT. A couple examples of those edge cases would go a long way, the one given is not very helpful to make a point.

Vinnl · on Jan 14, 2020

What's wrong with the one given? If my cursor moves because some other user makes some other text bold, that doesn't sound like a desirable experience?

jiofih · on Jan 14, 2020

Because it doesn’t highlight a particularly relevant deficiency of the CRDT protocol. Preserving cursor position is not hard in that case. The post even mentions stronger downsides of OT, the example does not support the conclusion. Why is OT better?

Vinnl · on Jan 14, 2020

Huh, the article gave me the impression that preserving the position was a crucial problem there.

As for the last question, I think the later posts will get to that.

eudoxus · on Jan 14, 2020

Preserving the position of the cursor is a crucial feature for collab editors. However, he doesnt go into detail explaining why CRDTs make this feature impossible. Sure the CRDT route splits the text, but I fail to see why that makes cursor pos preservation impossible?

spyder81 · on Jan 15, 2020

It's not impossible, particularly in cases where the model is fairly flat.

In complex models it is symptomatic of a larger issue with complexity in CRDTs; the post was already quite long and I didn't want to repeat any more of the research I'd linked to.

jiofih · on Jan 15, 2020

You mention that example as a turning point for choosing OT, confirming your research. But to an outsider it is not particularly convincing. Especially since you’d expect a CRDT implementation to use the rope/tombstone structure, applying the transform to that sequence of characters and not actually have to split a text node, which looks like a limitation of this particular implementation, or tight coupling between the data model and the DOM.

The post is a good introduction to OT and CRDT, but is very light on actual trade-offs and seems to end with a gut-feeling conclusion, when the title promised “trade-offs between”.

spyder81 · on Jan 15, 2020

It's hard to offer more detail than that in a blog post. I'm not going to repeat the research papers or write one of my own. The post is already so long and complex, covering so many surface details, I couldn't reasonably push it any further.

jiofih · on Jan 16, 2020

I’d suggest skipping right to the meat of the subject (pros/cons in this case), assuming some previous knowledge, if that’s what the article proposes to share, and simply point readers to learning resources; it’s not possible to start from scratch on every topic. The deeper the subject, the more glaring this problem becomes.

spyder81 · on Jan 15, 2020

I wasn't really trying to say OT is better than CRDT. Nor was I intending to provide a complete detailed explanation (there's enough research already).

I was driving at CRDT not being capable enough for arbitrary HTML structures, and giving a quick summary of what was an extended multi-week research process.

lxpz · on Jan 14, 2020

> the prospect of peer-to-peer editing with end-to-end encryption is an exciting one

Have you heard of CryptPad ? Not exactly peer-to-peer but has encryption so that the server sees nothing.

spyder81 · on Jan 15, 2020

I haven't! This is what our server team is looking at as an option for encrypting our RTC traffic.

https://tweetnacl.js.org/ https://github.com/dchest/tweetnacl-js/wiki/Examples

billconan · on Jan 14, 2020

this is a wonderful writeup! thank you!

I'm researching the same thing.

Any tutorial on OT implementation? Especially, I need to handle structured data, like json.

DennisP · on Jan 14, 2020

This article popped up on HN recently and is a great intro to CRDT implementation. It says OT is much more difficult.

http://archagon.net/blog/2018/03/24/data-laced-with-history/

spyder81 · on Jan 14, 2020

That's included in my research links, yes :D

teleclimber · on Jan 14, 2020

Here is a 4+ hour class recording on the topic. I started going through it and it has helped me understand ot better.

https://youtu.be/xovq8MXVsZM

spyder81 · on Jan 14, 2020

The second post will cover which editor model we chose, and talk a little bit about the OT implementation details ;)

billconan · on Jan 14, 2020

Looking forward to it! Thanks again!

fyp · on Jan 14, 2020

The only thing I remember from the CKEditor post from last year was that it took "42 man-years" to implement: https://news.ycombinator.com/item?id=18220020

I wonder if it's better understood now and can cost less R&D time.

spyder81 · on Jan 15, 2020

It is, as I'll cover in the second post ;)

marknadal · on Jan 14, 2020

w00h00, this is my area of expertise.

After 8 years of working on this, I have changed my thoughts:

- The correct algorithm is not always the correct user experience.

- End-to-end encryption is too important to not have.

- Offline support is great, but it behaving consistently is more important than it behaving "intently".

- Biggest pain points can most easily be solved at the editing layer, not data layer.

As a result, OT gets thrown out the door immediately.

I built the most popular CRDT powered database on the market (https://github.com/amark/gun, 11K+ stars, 11M+ monthly installs), but have some harsh words for CRDT obsessed people:

CRDTs are great, but if you create an infinitely complex one to handle "every word editing operation" it'll actually result in an inferior user experience.

As such, for example, GUN supports lock-free concurrent operations on any depth of data in a graph, but keeping text/strings behavior atomic is way more important than having built-in automatic string merges.

Why? It is much simpler to build a correct text/string CRDT on top of a predictable, stable, graph CRDT that is understandable.

Another example is, in our blog editing tools, despite having spent years researching & implementing (+ others in our community) precise character-by-character CRDT resolution schemes, I found I personally had a much better offline & local first user experience with a predictable per-paragraph sync scheme.

This is a stupid simple approach, but what it means is that I as a user, can easily predict whats going to happen, so if I'm offline I know I should just copy a new paragraph and fiddle with things there, cause it isn't gonna get "auto delete regrammar merged".

Generally speaking, me and colleagues don't "fight" over the same paragraph, we're usually concurrently writing different "sections" at the same time, it is pretty rare to grammar fix their edit same paragraph as they're typing it - that is also just kinda, rude looking.

Anyways, finally, is that the majority of text styling and DOM edge cases can be handled by having a deterministic canonical DOM hierarchy that always gets applied at the editing layer, BEFORE any CRDT operations even occur.

So for instance, re-arrange the DOM tree such that <i> is always inside <u> inside <b>, or something like that.

We built this into a library called normalize, and it instantly eliminated so much complexity, especially at the CRDT level. Ping me if you want a demo of this library.

Finally, for everyone else, we also built an INTERACTIVE CARTOON EXPLAINER of an extremely basic text CRDT to help others understand the more detailed concepts:

https://gun.eco/explainers/school/class.html

spyder81 · on Jan 15, 2020

User experience is definitely why we didn't go for an existing CRDT solution.

The model we've chosen is fairly agnostic to whether OT or CRDT is used to collaborate, as I'll expand on in the second post, so if we see an opportunity to switch to a CRDT-based solution we'll certainly jump on it!

You only need to normalise things like bold/italic/underline when using a browser DOM, every collaborative rich text model I've seen represents formatting as unordered attributes which is much easier to collaborate on.

throwaway739103 · on Jan 14, 2020

You are a charming and energized speaker and perhaps that is why Tim Draper thinks you a good investment as with others he has made. Acronyms like 'PTSD' and 'PARTY' in the project is likely good at attracting people who think they sound fun and do not question many details.

It is difficult to get brief clear details about how gun.js works. There are distracting or simple documents, discussions about high level ideas and tangentially-related topics. In some cases 'worse is better' yes but it sets a tone that the project is hiding internal problems of implementation or can not easy explain its own design.

The transparency of gun.js enterprise organizations also raises questions. Are people on your team related to you? Do you pay yourself and team and open source contributors?

marknadal · on Jan 14, 2020

Thank you!

They also like the math & science behind it, some investors do DD, some don't.

You're tone is pretty assuming, smug, and insulting.

It is running in production with millions of people, yes there are some hiccups, but surviving & we're improving it to scale even more.

I rebuild GUN from scratch in 30min on stage, that is how little/simple GUN is:

https://gun.eco/docs/Porting-GUN

Do I pay myself? Lol. Yes. No, I do not pay OSS, but yes people contribute regardless, strangers help me, my kids help me, investors help me, family & friends help me.

mkl · on Jan 14, 2020

With end-to-end encryption in GUN, what role does the server play? I assume the merging type activity happens in the clients?

BTW, the name makes it almost impossible to search for info about GUN.

marknadal · on Jan 14, 2020

Yes, merging is per peer, not server transformed.

"Servers" act as WebRTC bootstrapping signals, storage backups, relay/routing, etc., and you can have as many of them as you want (no need to be centralized).

True. `gunDB` tag should help a little.

josemaenzo · on Jan 14, 2020

Do you have any document that explain what resolution algorithm uses in what cases?

For example, one peer change a property value and the other peer deletes it.

marknadal · on Jan 14, 2020

Same algorithm for everything.

This cartoon explains it:

https://gun.eco/distributed/matters.html

It is a vector + timestamp + lexical sort.

A delete is changing a value to `null`, it would lose (I assume you are asking if/when these 2 changes happen at the exact same microsecond time, conflicting?) as is it has a lower lexical rank.

josemaenzo · on Jan 14, 2020

Can you refer me to a document where you explain the "lexical rank" that you are referring to?

The cartoon explanation do not tackle those terms.

marknadal · on Jan 14, 2020

JSON.stringify('a') < JSON.stringify('z')

https://en.wikipedia.org/wiki/Lexicographical_order

josemaenzo · on Jan 14, 2020

Cool! Make sense.

Let's say our state have an array like ['a','b'].

What about if Peer-A splice the first element (0) which results in ['b'] and Peer-B wants to mutate 'a' to 'A' which results in ['A','b'] and they both make the mutation at the same millisecond?

marknadal · on Jan 15, 2020

You have to choose which type of Array:

1. An atomic Array

2. Fixed index Array

3. Linked list

Now rather than not knowing what magic is going to happen, app developers know what to expect in advance.

So atomic would choose one or the other, not a combined result.

Fixed array 2nd item to 1st which nulls 2nd and makes 1st b, simultaneously also makes 1st A. Cause lower case b is lexically larger than A, you'd get [b, null].

Linked list theoretically could handle it as you suggest.

josemaenzo · on Jan 15, 2020

Does gun have a built-in "Linked list" in the API? If so, where are the docs? Could not find them.

If not, any paper or doc that explain how to implement it with gun?

marknadal · on Jan 15, 2020

Started, but never finished. :(

@nmaro has some pretty good stuff.

The cartoon explainers gives an overview.

But you'd probably want to do RGA algorithm or one of the newer ones.

Martin Kleppmann has a pretty good talk with bunch of paper references:

https://youtu.be/yCcWpzY8dIA

josemaenzo · on Jan 16, 2020

Oh, I see. Just wondering, How GUN handle a situation like described above when working with Arrays?