That’s just the example client code, not the sync code. It’s showing you how to demo that the sync works. You’re not expected to use that in production.
I mean to be fair, even in using something like Postgres logical replication, there's no way to verify whether a particular change has replicated (minus xmin hackery).
If I were to write a tutorial on how to set up PG replication and "observe the replication", I think that would probably involve "write to master, sleep for a bit, then see the data is in the secondary."
The problem is that Postgres is a long lived server so the app can fire a commit to the DB and then exit. Here, the app needs to perform the sync, so you can't exit. I didn't see a way to confirm the sync was complete either, so when can you safely exit?
Couldn't you just query the replication log positions on the primary and standbys and calculate their delta?
I asked this to ChatGPT 4 and it suggested something like this that seems reasonable:
Get it from the main server: SELECT * FROM pg_stat_replication, and the write_ls column has the log transaction location; Then get the location of the replication log and how much has been received: SELECT pg_last_wal_receive_lsn(); Then you can figure out the delta from any of these: SELECT pg_wal_lsn_diff(write_ls, pg_last_wal_receive_lsn()); or similar.
Sync notifications need to be async. That goes for `fsync()`/`fdatasync()` as well as for `COMMIT`. In the case of a database-using application you can report transaction non-failure (as in: it was well-formed and accepted) in a UI separately from transaction durability success (i.e., it really finished), and if you have replication to worry about, you can also report replication success separately.
Every blocking operation needs an async variant. Everything should be async-first.
Those silly Internet Protocol guys thought we'd use UDP multicast for such things. That was before everybody knew that "cluster" was just a code word for 5 VMS 1-2 core instances scattered between continents, laptops, and cloud providers.
Litestream/LiteFS author here. It's been a while since I looked at Mycelial but IIRC it does eventually consistent replication using CRDTs. That's great for systems that need multi-master but there are consistency tradeoffs that you have to watch out for.
Litestream & LiteFS are both physical, async replication tools so they actually work at the page level. When a transaction commits, the changed pages are shipped off to a replication destination. By replaying those changes in the same order, you can rebuild the current state of the database (or any point-in-time state in between). Both of these systems require a single primary node at a time so multi-master doesn't work with them.
Litestream only replicates to "dumb storage" (e.g. S3, SFTP) whereas LiteFS replicates to other live nodes so you can have live replicas.
cr-sqlite [0] is another option. It has the added benefit of being able to run in the browser (persisting to IndexedDB a la absurd-sql, though it uses a fork of wa-sqlite [1]). In theory it could also persist to the Origin Private Filesystem via SQLite's official impl, though switching doesn't seem to be strictly better [2]. A con is that it doesn't enforce foreign key constraints (though of course you can still do joins). Cr-sqlite also has better documentation about what CRDTs it supports than Mycelite [3], which is something I brought up in a recent thread [4](which might have triggered this post). I'm personally using cr-sqlite in an Anki clone I'm building [5] and have only positive things to say about cr-sqlite and its maintainer.
I’m unsure why this would be compared with Actual Budget or CRDTs in the README.
The issue that CRDTs are solving isn’t simply getting the bytes from one device to another, it’s ensuring that conflicts are merged in a sane way, which this project doesn’t clearly address. Simply syncing a DB across multiple devices won’t help if your app isn’t designed to handle data conflicts.
>The issue that CRDTs are solving isn’t simply getting the bytes from one device to another, it’s ensuring that conflicts are merged in a sane way
CRDTs merge data in a predictable way but not necessarily in a consistent or correct way according to the requirements of any particular application.
I find CRDTs interesting and they are certainly useful for some types of applications with relatively lax consistency rules. But I don't see how they help with the sort of consistency requirements that finance apps are bound to have.
Developing each new CRDT including its merge strategies is a fundamental research problem.
How likely is it that the right one exists (or that you can combine a few of them) to express some arbitrary, application defined consistency rule or merge strategy?
On top of that, any solution would have to agree with all your other requirements, such as your chosen degree of normalisation, your performance requirements, etc.
From my understanding, that was OP's point. This project isn't a research project, this project is a one size fits all coding project that purports to solve problems that you need a research project on a single domain in order to solve. It is not conceivable that this project could solve the problem that a CRDT is meant to solve any more than you can parse HTML with regex. But in the readme is explicitly compares itself with CRDTs and a program that uses CRDTs to solve its problem.
The question OP is asking is, why is it comparing itself to CRDTs and Actual Budget? If Mycelite has special sauce, what is that special sauce? If it doesn't have special sauce, do they understand they're attempting to solve a problem which requires special sauce?
You hit the nail on the head— I don’t mean to imply that CRDTs are perfect or useable for every use case, but they’re a way of handling sync conflicts.
It is just unclear to me how this project does so, and if it doesn’t, then why it even references CRDTs in the first place
I read OP as implying that CRDTs solve the problem while they are questioning whether this project does. My response to that is to question whether CRDTs solve the problem in the general case (although they may well be helpful in a specific case).
We've been building iOS/macOS apps in local-first fashion for years.
It works differently - there's a persistence library called Core Data on each client (most recently SwiftData). You can choose which database to use - most likely SQLite, but you could use XML files to store your data.
And then Core Data simply makes sure everything is in sync. It's an Apple system library, which means your data will automatically sync even if the app isn't running.
This feels like better approach than OP's Mycelite or better-known cr-sqlite.
Any reason why syncing on the database level would be better? It feels like something the library should do, not the database.
CoreData is a footgun when it comes to threading, my approach is to use a wrapper lib (eg CoreStore) that takes off the sharp edges, or just use SQLite directly (or with a lib like GRDB).
To be slightly pedantic and clarify. It's not actually anything to do with Core Data itself but CloudKit that is doing the syncing, more specifically, NSPersistentCloudKitContainer which came out a few years ago and makes adding CloudKit syncing to Core Data a one-liner basically.
This year at WWDC we got CKSyncEngine which gives you the primitives to add a CloudKit syncing to your own persistence layer.
CoreData doesn't work on any other platform, so you can only make Apple-apps. If you wanted to have a web version, an Android version or whatever else, you couldn't. So it's not a very good choice for most use-cases.
We use Realm for this. Very similar to Core Data, just removing some of the obvious footguns, but it works across both iOS and Android, and the real-time syncing is excellent.
I haven't used it in a long time but we had a lot of issues with random crashes showing up in logs and the compile-times are pretty bad. Has this gotten better since the Mongo acquisition?
CoreData is fine for syncing a single user’s data between their devices. Something like cr-SQLite handles the more challenging case of syncing collaborative data between multiple users.
It's funny how local-first is now considered "a new type of application". Ten years from now there will be a blog post here presenting a revolutionary type of application - "online first", to reduce the need of all this annoying complexity, synchronisation, multiple clients, etc.
> Now we're starting to see renewed interest in personal computers.
'cept this time round, those slick black rectangles are so locked down you can hardly program on them. And not to mention that people using them are mostly just looking at tiktok, twitter, instagram, snap, youtube.
The number of technical people today is vastly larger than it was back then. Their number just looks small because there's a new phenomenon of mainstream casual computing and those numbers dwarf anything else.
Personally I don't consider the black rectangles real computers. They're consumer content consumption devices, basically like televisions. They are to computers what a TV is to a production studio.
It's a weird way to phrase it, but "local-first" is different than "local-only", which was my gut reaction to the phrase. I think as a flip over the remote-then-sync model, it's apt, but could probably use some rebranding.
These days quite a lot of our problems are synchronisation problems (are you pulling mutable data regularly from a remote API and storing it? thats actually a syncronisation) but despite that synchronisation and how complicated it can be doesnt seem to get discussed. You don't hear debates about different approaches, we dont seem to have a taxonomy of different techniques.
People seem to forget how hard it is. So lets say you're pushing changes from one system to another. How are you detecting changes? timestamps or a direct diff? Or checksums? Have you remembered to think about deletions? Can you even detect deletions? How do you keep track of which bits have successfully synced and which havent? If something unusual goes wrong (network goes down) halfway through a sync can you recover from that? Incremental sync inevitably drifts over time due to bugs and downtime, so are you also building a full-scale wipe-and-refresh or compare-and-reconcile to accompany the incremental sync? And dont even get me started on two-way syncs and conflict resolution and how hard that is to do without some sort of review-by-human component.
I think the reason why it's not discussed that much is because a lot of software gave up on properly synchronizing changes and go with a "handwaving eventual consistency" approach for a lot of things.
Furthermore decisions about which db is used are rarely driven by the synchronization model. Except maybe sometimes the question "eventual consistency yes/no/a bit only".
And in academia (wrt. common use-cases) we do have a bunch of well analyzed ways to archive synchronization the main question often isn't one of academic research needing to be done but of which model with which parameters and implementation details on top to use for a specific kind of db. Which mainly is discussed by devs when writing new dbs, i.e. not a everyday thing. While for the many new sqlite sync approaches the answers is nearly always "handwaving eventual consistency", cause nothing else would work on the edge anyway.
sure some systems under correct operation will never ever be fully in sync as long as they are used
but most systems also do not need to be ever fully in sync
it's good enough that for a specific context they will be in sync in not to much time if that context stops changing
and that is something they do provide
e.g. after updating a JSON blob stored under a specific id that update will be eventually available in the not too distant future and if no future changes to that document happen then in the context of that document the system will be fully in sync. But because you have very man documents there will always be a document which isn't yet in sync and in turn the system as a whole will never be fully in sync. But in the end that doesn't really matter.
Also: no computer system will live that long, and if they aren't put in sync before they stop working they will literally never be fully in sync.
but a system as a whole never being fully in sync is a red herring argument which misses the point and focuses on an aspect which might sound like a problem but hardly ever is a problem at all
I’m just messing with using SQLite with serverless on Vercel and really like the simplicity of having just a db file you can read locally instead of setting up and connecting to a db instance somewhere else.
This wouldn’t work ofc as fs access in Vercel is limited, plus syncing the db would (probably) delay the network response. But
A few sites of mine are backed by SQLite databases replicated daily from another source. Right now I’m rsyncing them, but sending an even increasing number of gigabytes a day isn’t great for the long term. Anyone aware of some sort of checkpointing and incremental sync solution for SQLite? Doesn’t need to be real time, and shouldn’t be real time, really. This one isn’t what I’m looking for unfortunately, the “sign up for a free account” part is also pretty crazy.
Verneuil https://github.com/backtrace-labs/verneuil might work for you. Its replication client can use a persistent chunk cache (no fsync, so doesn't survive reboots), and there's a CLI utility to reconstitute a sqlite db file (`verneuilctl restore`). It also tries to do the right thing by writing to a temporary file before renaming it over the old db file.
On the write-side, you can load the verneuil VFS as a runtime extension. There's a similar gotcha with reboots, where the replication state is persisted but not fsync-ed, so you get a full refresh on reboots. The S3 storage is content addressed though, so you "only" pay for useless bandwidth and API calls, not storage.
The SQLite Session extension[1] is one option for shipping row-level change sets between databases. It may be more efficient than rsync and you can do it in batches.
Can you talk more about your use case? I'm the author of LiteFS which does real-time replication of databases. It doesn't sound like it's a fit for your use case but I'm always interested in hearing about what people are doing.
At this point, why not switch to using a real database like mariadb or postgresql? These days you don't even have to run it yourself, every cloud provider offers it as a service and can handle backups, patching, replication for you
I commented on another thread about cr-sqlite [0]. In addition to that, I believe Mycelial is VC funded, whereas Matt has Github sponsors. I hope there's a future where he teams with, say, Fly.io to make cr-sqlite sustainable.
I wish sqlite was performance focused. Perf is pretty much always worse than a db server, even after tuning the pragmas (the way it locks tends to limit single threaded perf and concurrency). Main benefit of sqlite is just its portability since you can easily embed it. It's nice for slapping a db backend on a client installed app, where trying to spin up a db server would be problematic.
There's probably an extension for permissions in sqlite but out of the box the db files are plaintext (you need an extension for that).
In my experience, SQLite has excellent performance if you use it correctly. It can easily do 10k to 500k inserts/second, depending on the workload.
SQLite only supports a single writer at a time, but does support concurrent reads. There is also an experimental branch that supports concurrent writes.
Any numbers to support that claim? SQLite performs pretty well when just reading. Only write-intensive workloads are usually slower than on pgsql and others.
the SQLite db files are definitely not plain text. (only in a cryptographer's jargon, but even that is not true, because they might be encrypted internally using an extension)
SQLite is absolutely focused on performance and excels at it for its intended use cases. You can read about them here: https://www.sqlite.org/whentouse.html