This is exactly what the engineers behind FoundationDB (FDB) wanted when they open sourced. For those who don't know, FDB provides a transactional (and distributed) ordered key-value store with a somewhat simple but very powerful API.
Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level (Layers) APIs [1] / libraries [2] on top.
The beauty of these layers is that you can basically remove doubt about the correctness of data once it leaves the layer. FoundationDB is one of the most (if not the) most tested [3] databases out there. I used it for over 4 years in high write / read production environments and never once did we second guess our decision.
I could see this project renamed to simply "fdb-sqlite-layer"
> Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level (Layers) APIs [1] / libraries [2] on top.
That is very interesting and simple and valuable insight that seems to be missing from the wiki page. But also from the wiki page <https://en.wikipedia.org/wiki/FoundationDB>, this:
--
The design of FoundationDB results in several limitations:
Long transactions- FoundationDB does not support transactions running over five seconds.
Large transactions - Transaction size cannot exceed 10 MB of total written keys and values.
Large keys and values - Keys cannot exceed 10 kB in size. Values cannot exceed 100 kB in size.
--
Those (unless worked around) would be absolute blockers to several systems I've worked on.
This project (mvSQLite) appears to have found a way around the 5s transaction limit as well as the size, so that's really promising. That being said, I believe the new RedWood storage engine in FDB 7.0+ is making inroads in eliminating some of these limitations, and this project should also benefit from that new storage engine...(prefix compression is a big one).
but now transactional guarantees only extend to the id stored in the DB, and not on the external storage.
Therefore, it's possible that the id is invalid (for the external storage) when referenced in the future. I think doing so only adds complexity as system grows.
It would be better to chunk your blob data to fit the DB, imho. It beats introducing external blob storage in the long run.
> but now transactional guarantees only extend to the id stored in the DB, and not on the external storage.
Depends! If the ID is a cryptographic hash, then as long as the blob is uploaded first, then the DB can't be inconsistent with the blob[1].
A Merkle Tree also allows "updates" by chunking the data into some convenient size, say 64 MB, and then building a new tree for each update and sticking that into the database.
[1] With the usual caveats that nobody is manually mucking about with the blob store, that it hasn't "lost" any blobs due to corruption, etc, etc...
> With the usual caveats that nobody is manually mucking about with the blob store, that it hasn't "lost" any blobs due to corruption, etc, etc...
Yeah, with those caveats. But how do you make sure they apply? If someone does manually muck about with the blob store, or it does lose blobs due to corruption, then your transaction is retroactively "un-atomicized" with no trace thereof in the actual DB.
If corruption happens then any guarantees by the hardware are voided, and the software guarantees (of durability) which are built o the hardware guarantees are equally voided. So corruption -> dead in the water.
But if you upload to the blob store first, then add to the transaction (in your db insert) with the id (hash or not), what happens if the db transaction fails? You now have to work out a way to delete off the blob external store. Or change your application so that that it doesn't matter if it's left on the blob store ('cept for money).
Running it locally is as easy as downloading and installing. Scaling FDB is a bit more of a challenge partially due to their process-per-core design decision, which coincidently helps make FDB as bullet proof as it is.
Where I previously was runs it in production. it's not hard to scale but at some point you will need to have multiple clusters (maxes out in practice at like 50 instances).
It's basically trouble free unless you run below 10% free space on any instance, where things go bad.
Not sure if I hit those limits, we were at around 100 nodes and over 170-180 processes. The biggest thing we recognized was tuning the number of recruited proxies and other stateless roles. We were doing up to around 400k tps once we tuned those.
The problem here is when you try and recover the cluster by adding nodes - since FDB manages itself using transactions, this becomes very, very slow and painful if you've allowed multiple nodes to get into this state (which, because of balancing, is kind of how you get there).
Basically, fdb is great as long as you avoid this situation. If you do, woe unto you trying to being the cluster back online by adding nodes and hoping for rebalancing to fix things. It will, it is just very, very slow. I don't know if that's true in the current version.
Good to know. and it seems to be tuneable with `knob_min_available_space_ratio`[0], as 10% free space on a 4TB drive would be 400GB.... Not exactly hurting for space there.
Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level (Layers) APIs [1] / libraries [2] on top.
The beauty of these layers is that you can basically remove doubt about the correctness of data once it leaves the layer. FoundationDB is one of the most (if not the) most tested [3] databases out there. I used it for over 4 years in high write / read production environments and never once did we second guess our decision.
I could see this project renamed to simply "fdb-sqlite-layer"
[1] https://github.com/FoundationDB/fdb-document-layer
[2] https://github.com/FoundationDB/fdb-record-layer
[3] https://www.youtube.com/watch?v=OJb8A6h9jQQ