This is exactly what the engineers behind FoundationDB (FDB) wanted when they op...

zasdffaa · on Aug 21, 2022

> Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level (Layers) APIs [1] / libraries [2] on top.

That is very interesting and simple and valuable insight that seems to be missing from the wiki page. But also from the wiki page <https://en.wikipedia.org/wiki/FoundationDB>, this:

--

The design of FoundationDB results in several limitations:

Long transactions- FoundationDB does not support transactions running over five seconds.

Large transactions - Transaction size cannot exceed 10 MB of total written keys and values.

Large keys and values - Keys cannot exceed 10 kB in size. Values cannot exceed 100 kB in size.

--

Those (unless worked around) would be absolute blockers to several systems I've worked on.

monstrado · on Aug 21, 2022

This project (mvSQLite) appears to have found a way around the 5s transaction limit as well as the size, so that's really promising. That being said, I believe the new RedWood storage engine in FDB 7.0+ is making inroads in eliminating some of these limitations, and this project should also benefit from that new storage engine...(prefix compression is a big one).

jiggawatts · on Aug 21, 2022

A simple workaround is to store “bulk data” in an external system like blob storage and reference that from the DB.

chii · on Aug 22, 2022

but now transactional guarantees only extend to the id stored in the DB, and not on the external storage.

Therefore, it's possible that the id is invalid (for the external storage) when referenced in the future. I think doing so only adds complexity as system grows.

It would be better to chunk your blob data to fit the DB, imho. It beats introducing external blob storage in the long run.

jiggawatts · on Aug 22, 2022

> but now transactional guarantees only extend to the id stored in the DB, and not on the external storage.

Depends! If the ID is a cryptographic hash, then as long as the blob is uploaded first, then the DB can't be inconsistent with the blob[1].

A Merkle Tree also allows "updates" by chunking the data into some convenient size, say 64 MB, and then building a new tree for each update and sticking that into the database.

[1] With the usual caveats that nobody is manually mucking about with the blob store, that it hasn't "lost" any blobs due to corruption, etc, etc...

CRConrad · on Aug 22, 2022

> With the usual caveats that nobody is manually mucking about with the blob store, that it hasn't "lost" any blobs due to corruption, etc, etc...

Yeah, with those caveats. But how do you make sure they apply? If someone does manually muck about with the blob store, or it does lose blobs due to corruption, then your transaction is retroactively "un-atomicized" with no trace thereof in the actual DB.

zasdffaa · on Aug 22, 2022

If corruption happens then any guarantees by the hardware are voided, and the software guarantees (of durability) which are built o the hardware guarantees are equally voided. So corruption -> dead in the water.

But otherwise I agree.

chii · on Aug 22, 2022

But if you upload to the blob store first, then add to the transaction (in your db insert) with the id (hash or not), what happens if the db transaction fails? You now have to work out a way to delete off the blob external store. Or change your application so that that it doesn't matter if it's left on the blob store ('cept for money).

victor106 · on Aug 21, 2022

How do you get started with FDB? I found it very powerful but couldn’t find good set of instructions on how to setup and scale.

monstrado · on Aug 21, 2022

Running it locally is as easy as downloading and installing. Scaling FDB is a bit more of a challenge partially due to their process-per-core design decision, which coincidently helps make FDB as bullet proof as it is.

foobiekr · on Aug 21, 2022

Where I previously was runs it in production. it's not hard to scale but at some point you will need to have multiple clusters (maxes out in practice at like 50 instances).

It's basically trouble free unless you run below 10% free space on any instance, where things go bad.

monstrado · on Aug 21, 2022

Not sure if I hit those limits, we were at around 100 nodes and over 170-180 processes. The biggest thing we recognized was tuning the number of recruited proxies and other stateless roles. We were doing up to around 400k tps once we tuned those.

tehbeard · on Aug 21, 2022

How bad are we talking? Performance degradation? Or losing transactions and data?

losfair · on Aug 21, 2022

FDB refuses to process writes (degraded write availability) when usable disk space goes below 10%. Nothing will be lost though.

foobiekr · on Aug 27, 2022

The problem here is when you try and recover the cluster by adding nodes - since FDB manages itself using transactions, this becomes very, very slow and painful if you've allowed multiple nodes to get into this state (which, because of balancing, is kind of how you get there).

Basically, fdb is great as long as you avoid this situation. If you do, woe unto you trying to being the cluster back online by adding nodes and hoping for rebalancing to fix things. It will, it is just very, very slow. I don't know if that's true in the current version.

tehbeard · on Aug 22, 2022

Good to know. and it seems to be tuneable with `knob_min_available_space_ratio`[0], as 10% free space on a 4TB drive would be 400GB.... Not exactly hurting for space there.

[0] https://forums.foundationdb.org/t/brand-new-macos-installati...