Automerge-Repo: A "batteries-included" toolkit for local-first applications

pcl · on Nov 8, 2023

Cool stuff!

What do you suggest is the sweet spot for document size and "hotness"? Your cookbook [0] says "We suspect that an Automerge document is best suited to being a unit of collaboration between two people or a small group." Does that mean tens of kilobytes? Hundreds? More? And how much concurrent contention is viable? And is the "atom of contention" the document as a whole, or do you have any plans for merging of sub-parts?

Also, do you have support for juggling multiple transports, either concurrently or back-to-back? In particular, I'm thinking about synchronizing via the cloud when connected, and falling back to peer-to-peer when offline. In that peer-to-peer case, how many peers can I have, and can my peer network behave as a mesh, or must it stick together to some degree?

And finally, it looks like your tutorial [1] doesn't actually exist! You refer to it in a blog post [2], but it's a dead link.

[0] https://automerge.org/docs/cookbook/modeling-data/

[1] https://automerge.org/docs/tutorial/introduction/

[2] https://automerge.org/blog/automerge-2/

pvh · on Nov 8, 2023

The way I think about it is that if the data should always travel together it should be in one document. For example -- if your TODO list always goes as a unit, then make it an array of objects in a single Automerge document. On the other hand, if you want to build an issue tracker and to be able to link to individual issues or share them individually then a document each is the way to go. Does that help?

As for network transports you can indeed have multiple at once. I usually have a mix of in-browser transports (MessageChannels) and WebSocket connections. I suspect we'll need to do a little adjusting to account for prioritization once people really start to push on this with things like mDNS vs relay server connections but the design should accommodate that just fine.

As for the docs, my apologies. The "tutorial" was merged into the quickstart as part of extensive documentation upgrades over the last few months. We should update the link in the old blog post accordingly.

Here's a link to save you the effort: https://automerge.org/docs/quickstart/

pcl · on Nov 8, 2023

Cool, thanks for the details.

So if I smoosh everything in my sorta “collaboration context” together into one document, are there any provisions for delta updates on the wire? Your browser-side storage format sounds like it’s compatible with that approach, but what about clients that are far apart version-wise? Are you storing full relay history and also a snapshot?

I see in your format docs [0] that you store change chunks. Are these exposed in the API for atomicity at all? Are there any atomicity guaranties?

And you discuss backends, but I don’t see any pointers to an S3 or Postgres implementation. Is that something you’re keeping closed source for your business model, or am I just missing something?

I haven’t found anything about authorization. Have you done any work there? I quite like the Firebase model in which you can write simple validation rules that can evaluate against the document itself —- “only allow users who are listed in path `members` to write to this document” or whatever.

[0] https://automerge.org/automerge-binary-format-spec/#chunk-co...

pvh · on Nov 9, 2023

The sync protocol does indeed calculate the delta between peers and efficiently catches both sides up.

The backends you see are the ones I use, but the API is a binary blob key value store with range queries: supporting other stores should be straightforward.

Authentication isn’t exactly left as an exercise to the reader but is an area of active work. I would say securing access to a URL via whatever mechanism you’re used to should be fine for client server applications and peer to peer folk seem to mostly have their own ideas.

xrd · on Nov 8, 2023

If you are interested in this, check out the video from StrangeLoop 2023:

https://www.youtube.com/watch?v=Mr0a5KyD6BU

Also, check out the unconf for localfirst that happened right after 2023:

https://github.com/LoFiUnconf/stlouis2023

Ink & Switch is doing such interesting stuff. Their after party at StrangeLoop was so cool.

bryanlarsen · on Nov 8, 2023

It's too bad the unconf was full, I didn't get in. Hopefully they do it again.

satvikpendem · on Nov 8, 2023

Nice, I use automerge with Rust via autosurgeon [0] which is their Rust wrapper, but looks like it hasn't been updated recently, any updates on that? I'm guessing with a small team that web support is taking priority right now, as I'm running this on my Rust client (technically Flutter but via the FFI package flutter_rust_bridge [1]) and server (via the Axum web server crate).

[0] https://github.com/automerge/autosurgeon

[1] https://github.com/fzyzcjy/flutter_rust_bridge

izak30 · on Nov 8, 2023

We're using the same stack, along with automerge-repo-rs, we haven't needed much in the way of updates, what are you hoping for that doesn't exist?

Edit: Typo `autosurgeon-repo-rs` to `automerge-repo-rs` and link. https://github.com/automerge/automerge-repo-rs

satvikpendem · on Nov 8, 2023

Is autosurgeon-repo-rs separate from autosurgeon? I can't find anything by that specific name on Google.

The API is still a little clunky, with hydrating and reconciling, and it's not as clean as the automerge-repo one, especially with those React examples.

izak30 · on Nov 8, 2023

Sorry for the typo. Updated. I get what you mean but maybe I've gotten used to it. We also added a little sibling library to it for partial hydrating and reconciling that fits our patterns better. https://github.com/bowtieworks/automerge_orm

caelinsutch · on Nov 9, 2023

Curious how you think about this compared to Electric SQL [1] - I'm currently deciding what sync solution we're going to use for a product rebuild and have been looking at quite a few

[1] https://electric-sql.com/

scotttrinh · on Nov 8, 2023

Super excited to see Automerge getting this high-level API out. Been following since before 1.0 and I can't wait to play around with the latest incarnation! Congrats to the Automerge team.

pvh · on Nov 8, 2023

Thanks, Scott. This API should make it much, much easier for folks to build with Automerge and kind of just encapsulates everything we've been doing in-house over the last few years.

zyang · on Nov 8, 2023

Last time I looked into CRDT, automerge was not as fast/efficient as yjs, but the team was actively improving the algorithm. Is there any benchmark to show the progress.

heathermiller · on Nov 8, 2023

still not as fast/efficient as Yjs. there are some benchmarks here from late September’23: https://arxiv.org/abs/2212.02618

disclaimer: i’m a co-author and the paper is focused on a different CRDT framework, but point is that it measures Yjs and automerge side by side

pvh · on Nov 8, 2023

The benchmarks Matt Weidner has been working on are great and outside scrutiny is always welcome, but I should note that I find there's an element of artificiality to them. In particular, testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild". In our research, we've found that editing is usually serial or asynchronous. (See https://inkandswitch.com/upwelling for further discussion of our collaboration research.)

The benchmark that concerns me (and that I'm pleased with our progress on!) is that you can edit an entire Ink & Switch long-form essay with Automerge and that the end-to-end keypress-to-paint latency using Codemirror is under 10ms (next frame at 100hz).

While these kinds of benchmarks are incredibly appreciated and absolutely drive us to work on optimizing the problems they uncover, we try to work backwards from experienced problems in real usage as our first priority.

heathermiller · on Nov 8, 2023

Ouch Peter. Massive offense taken.

> In our research, we've found that editing is usually serial or asynchronous.

Medium-to-large-size company with a town hall = many people editing a document at the same time. Workshop at a company or a university with a modest size classroom = many people editing a document at the same time. I can't tell you how many times our web-based collaborative code editors would fall over during talks with small audiences we would give back in the days when I led the Scala Center.

Just because one of the benchmarks you have seen (of a multitude of benchmarks) breaks automerge by stressing it in what we believe is the most stressful scenario possible– multiple concurrent users, which is sort of the point of concurrency/collaboration frameworks, does not make it artificial or worth so flippantly discarding.

> long-form essay with Automerge and that the end-to-end keypress-to-paint latency using Codemirror is under 10ms (next frame at 100hz)

Not at all what we measured.

I'd just like to register here that Yjs is the framework most widely used "in real usage" (your words) and not automerge (for many reasons, not just performance.)

pvh · on Nov 8, 2023

Please accept my unreserved apologies, Heather! No offense is intended. I can speak for everyone working on Automerge when I say that we've very much appreciated Matthew's work and have indeed spent quite a lot of time studying and responding to it. We spoke about it in person last week, in fact.

As for the use-cases, I do not mean to exclude live collaboration from consideration, just to note that it hasn't been our focus or come up often in the use-cases we study. Live meeting notes are definitely a real use-case and I don't dispute the performance results you show.

As for Y-js, it's a wonderful piece of software with excellent performance and a vibrant community made by exceptional people like Kevin Jahns. We simply have slightly different goals in our work, which undoubtedly reflect where our engineering investments lie.

Indeed, your paper did not measure the same things we look at, and that's why it found new results. Hopefully in time we will join the other systems in performing well on your benchmarks as well.

genuine_smiles · on Nov 9, 2023

> We simply have slightly different goals in our work, which undoubtedly reflect where our engineering investments lie.

I’d love to hear more about this. Do you elaborate anywhere?

neftaly · on Nov 11, 2023

I have been writing a video game using automerge-repo for networking & save files. I researched Yjs and Automerge and felt that Yjs is better suited to an ongoing session like a conference call, whereas automerge is better suited for network partitions etc. This fit my use-case best. My opinion might be out-of-date as this area is moving quickly, and there are quite a few options out there now.

titzer · on Nov 9, 2023

> there's an element of artificiality to them. In particular, testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild".

I've seen Matt's work and I think it's quite reasonable to benchmark a concurrent datastructure under concurrent load. Placing systems under high load, even just as a limit study, is how we reveal scalability bottlenecks, optimize them, and avoid pathologies. It's part of good engineering.

If your work can produce more representative workloads from the real world, then they could add to the field's knowledge with new benchmarks.

Terretta · on Nov 9, 2023

> testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild"

We use co-editing far more commonly than serial editing.

Coming from a background of XP (extreme programming, pair programming) and a Pivotal Labs style approach to co-thinking, even for executive work we require everyone in a meeting (whether at conference table or remote) to be in the document being shared, and instead of giving feedback, comment or edit in place.

We care a LOT about how laggy this works, how coherent it remains, or whether it blows up and has to be restarted, or worse, reverted.

If a firm culture "whiteboards" by having one person at the board and everyone else surfing HackerNews, they might not be exercising this. If a firm culture is that whiteboards are a shared activity, everyone gathered around holding their own marker, or even just grabbing it from each other, they might need to exercise CRDTs this way.

Put another way, if you "Share" in conf room with an HDMI cable to a TV, or share in a Teams or Zoom by window sharing, you may not be a candidate.

If you "share" by dropping a link to the document in a chat, and see by the cursors and bubbles who is following along, you are a candidate.

. . .

In "Upwelling" you describe an introverted and solitary creative process, before revealing a sufficient quality update to others.

That is certainly a valid use case for unspooling thoughts from one brain, and if those are the wilds you are observing, makes sense why that's what you'd observe in the wild.

It is not, however, the most productive for inventing solutions to logic puzzles with accuracy and correctness in fewer passes, nor for most any other "group" activity. So maybe your "not what we see in the wild" should be qualified by "but we're actually not looking for live collaboration, we're looking for post drafting merge".

That said, now the choice of the term "auto-merge" is much clearer, advertising your use case right on the tin, if one thinks about it.

So thanks for the upwelling link, repeated here for convenience:

https://inkandswitch.com/upwelling

pvh · on Nov 9, 2023

Automerge does indeed work with live collaboration, though apparently not currently as efficiently as some other solutions. Everyone working in this space is exploring and looking for solutions that will work for users woth slightly differing priorities. In addition to automerge consider checking out yjs, electricsql, diamond types, replicache, vulcn, or any of the other folks. Hopefully one of them will be just right for you.

thesurlydev · on Nov 8, 2023

This is exciting in several ways including the fact that Martin Kleppmann is involved with the project. Filed at the top of my reading list if nothing else to see an undoubtedly good example of a complex Rust project.

windock · on Nov 9, 2023

Author of Designing Data-Intensive Applications book. It is such a well-written book, reading it right now, and cannot recommend enough.

anglinb · on Nov 8, 2023

This is super powerful, been playing around with the previous releases for the past few days. It works really well, but still needs a few dx tweaks to make it performant for large applications. You have to watch the callbacks yourself to update slices of state and unless your app is small enough that the whole thing can re-render every update.

That being said, I love everything automerge is doing and hope this pace will keep up!

zby · on Nov 9, 2023

So is it like https://en.wikipedia.org/wiki/Google_Wave - but with limited scope and with new algos and finally usable?

fragmede · on Nov 9, 2023

Google Docs is the spiritual successor to Wave, and has been usable for quite some time.

Closi · on Nov 9, 2023

Spiritual predecessor - it was released 3 years before Wave :)

fragmede · on Nov 9, 2023

yeah but it didn't gain the features too be a successor to wave until much more recently. point is, the collaborative features of wave exist today :)

bomewish · on Nov 8, 2023

Seems we have a really great technical spec — but aren’t y’all gonna build a product on it and let us pay you to use? A google docs for markdown/quarto documents would be brilliant but apparently does not yet exist…

pvh · on Nov 8, 2023

Automerge is a library that anyone can adopt, and we are a research organization, not a product company.

We have built a variety of projects with Automerge, both publicly and for use in private, including recently the markdown-with-comments editor we call Tiny Essay Editor (https://tiny-essay-editor.netlify.app/) by Geoffrey Litt.

That said, sponsoring the Automerge team helps us build faster and is always welcome. (Thanks to our current and past sponsors for their support!)

GeneralMaximus · on Nov 9, 2023

Would Automerge be a good choice for a non-realtime single-user app that just needs to have reliable offline support?

E.g a personal note-taking app where the user will never have any collaborators, but where they expect the app to work fully offline on multiple devices and reliably sync up when they come online.

neftaly · on Nov 9, 2023

Yes, it's a great fit for this. You would probably want an internet-accessable sync server with a copy of the repo, so that the data is still available when no peers are online. Bluetooth/uPnP network adapters would be the cherry on top but AFAIK aren't ready yet.

neftaly · on Nov 10, 2023

*mDNS not uPnP sorry

idosh · on Nov 8, 2023

How is it compared replicache, watermelondb and the rest?

parhamn · on Nov 8, 2023

Anyone have any info on who is behind this project, how reliable it is (will it be around in 2 years), etc? Considering using it for one of my projects.

pvh · on Nov 8, 2023

Ink & Switch is behind it; or more expansively mostly Orion Henry, Alex Good, Martin Kleppmann, and myself. As an organization, we have been working on Automerge for about six years now. We also have a wonderful community of other contributors both in industry and research.

Automerge is not VC-backed software. Indeed, for a number of years Automerge was primarily a research project used within the lab. Over the last year, it has matured to production software under the supervision of Alex Good. The improved stability and performance has been a great benefit to both our community and internal users. Our intention is to run the project as sponsored open source for the foreseeable future and thus far we have done so thanks to the support of our sponsors and through some development grants.

Ink & Switch's research interests drive a lot of Automerge development but funding from sponsors allows us to work on features that are not research-oriented or to accelerate work that we'd like to do but that doesn't have current research applications. If you adopt Automerge for a commercial project, I'd encourage you to join the sponsors of Automerge to ensure its long-term viability.

dboreham · on Nov 9, 2023

Quick note for the casual reader that this is one of the oldest and preeminent projects in the space.

davgoldin · on Nov 9, 2023

Congrats! Many moons ago the lack of undo/redo was the main blocker. Has this been added?

pvh · on Nov 9, 2023

Robust undo/redo remains an ongoing research project. Leo Stewen's work was presented at PLF 2023 a few days ago. It turns out to be a subtle problem to really get completely right, but in my experience you can usually get passable results by letting editors default undo behaviour reverse text input.

For applications with more document-structured data, you can now produce inverse patches using Automerge.diff to go between any two points. To implement a reasonable undo in this environment you can record whatever document heads you consider useful undo points and then patch between them.

To perhaps expand on why the problem remains unsolved slightly further, there was a robust discussion about what the expected behaviour of "undo" out to be in even simple cases at the conference.

coding123 · on Nov 9, 2023

It sounds like ms word saving to onedrive

benatkin · on Nov 9, 2023

By calling it "repos", they're trying to capitalize on the popularity of VCS repositories, but these don't have their history implicitly tracked the way Automerge does, just explicitly tracked by committing and pushing.

I think it's cool, but I still see CRDTs as very niche.

I also want "local-first" but what I really want is something closer to how traditional desktop apps just open up, edit, and save files, not some real time collaboration that is already set up before I add my first collaborator.