Hacker News new | past | comments | ask | show | jobs | submit login

Wow, that thing is really clever.

Suggestion: since having multiple writers will corrupt the database, it's worth investigating if the recent (November 2024) S3 conditional writes features might allow you to prevent that from accidentally happening: https://simonwillison.net/2024/Nov/26/s3-conditional-writes/




I think that works, but I think they'll need to use it to implement an external lock rather than coordinating writes to a single object as those comments suggest. They use an object per page, not an object per database, so they have to synchronize updates to multiple objects instead of just one. But a global writer lock is what SQLite does locally, anyway. It's a good idea.

I see some challenges, though. Can we implement a reader-writer lock using conditional writes? I think we need one so that readers don't have to take the writer lock to ensure they don't read some pages from before an update happening concurrently and some pages from after. If it's just a regular lock, oops--now we only allow a single reader at a time. I wonder if SQLite's file format makes it safe to swap the pages out like this and let the readers see a skewed list of pages and I'm just worrying too much.

A different superpower we could exploit is S3 object versions. If you require a versioned bucket, then readers can keep accessing the old version even in the face of a concurrent update. You just need a mechanism for readers to know what version to use.


The problem with a block/page based solution (block per page is a bad idea, IMO, because S3/GCS are cost/performance optimized for around megabyte objects) is consistency: ensuring a reader sees a consistent set of blocks, i.e. all of the same database version.

Under concurrency, but also under a crash.

Similarly, the big issue with locks (exclusive or reader/writer) is not so much safety, but liveness.

It seems a bit silly to need to have (e.g.) rollback journals on top of object storage like S3.

Similarly, the problem with versioning is that for both S3/GCS you don't get to ask for the same version of different objects: there's no such thing.

So you'd need some kind of manifest, but then you don't need versioning, you can use object per version.

Ideally, I'd rather use WAL mode with some kind of (1) optimized append for the log, and (2) check pointing that doesn't need to read back the entire database. I can do (1) easily on GCS, but not (2); (2) is probably easier in S3, not sure about (1).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: