interesting. > And that's stored in Dynamodb, which is very much a KV store and ...

gopalv · on Aug 12, 2021

> good thing about badgerdb in our case is badgerDB data files live on the same filesystem as our file based manifests

I think that's pretty much the difference - once you start storing this in a distributed system (like dynamodb), you start getting a bit weird timing issues on how to serialize operations which could potentially step on each other. Distributed systems of consistency are usually a headache when you let users pick what key they want to create (basically anyone could create a path named "/tmp/foo.txt" at the same time from anywhere in the platform).

From my point of view, you have built a good WAL implementation using BadgerDB instead of doing the thankless & distracting work of separating fsync from fdatasync.

> So we are trying to move all the information in manifest files to separate badgerDB instances for each shard

And as you mentioned, because you don't want to put a WAL in a generic mutable store (instead of being append-only logs), that makes sense. Instead of switching to an append-only store & switch to it, it makes sense to move the whole thing in.

I haven't touched this area in a decade, but when I had to work with zBase at Zynga (my org built it, but I paid attention to the serialization + compression, not the storage persistence), it had its metadata store as sqlite files (the "file manifests" would be sqlite files), which was a ridiculously simple thing because the writers were single machine + single mutex (pinned to a core to reduce jitter) to record metadata. This was also built in the era of spinning rust and specifically for EBS (so there was a lot of group-commit coalescing of requests, which naturally flowed into BEGIN DEFERRED).

Was a joy to debug in a lot situations where I was happy there wasn't some custom protobuf parser needed to join + merge + update the data on disk while it was turned off.