Hacker News new | past | comments | ask | show | jobs | submit login

interesting. > And that's stored in Dynamodb, which is very much a KV store and it was somewhat of a nightmare to keep track of these things. what problems did you face to keep track of these things in a KV store? good thing about badgerdb in our case is badgerDB data files live on the same filesystem as our file based manifests and so in essence its just a fancy file based manifest that supports efficient features like append only operations, partial reads, crash recovery etc. In theory we can implement all such features on our existing manifest files. With badgerDB we just get them out of the box and snapshots still work as expected. So we are trying to move all the information in manifest files to separate badgerDB instances for each shard. But till we have source of truth in manifest files as well as badgerDB (while we are in transition), we have to do the dance of synchronizing data between the two.



> good thing about badgerdb in our case is badgerDB data files live on the same filesystem as our file based manifests

I think that's pretty much the difference - once you start storing this in a distributed system (like dynamodb), you start getting a bit weird timing issues on how to serialize operations which could potentially step on each other. Distributed systems of consistency are usually a headache when you let users pick what key they want to create (basically anyone could create a path named "/tmp/foo.txt" at the same time from anywhere in the platform).

From my point of view, you have built a good WAL implementation using BadgerDB instead of doing the thankless & distracting work of separating fsync from fdatasync.

> So we are trying to move all the information in manifest files to separate badgerDB instances for each shard

And as you mentioned, because you don't want to put a WAL in a generic mutable store (instead of being append-only logs), that makes sense. Instead of switching to an append-only store & switch to it, it makes sense to move the whole thing in.

I haven't touched this area in a decade, but when I had to work with zBase at Zynga (my org built it, but I paid attention to the serialization + compression, not the storage persistence), it had its metadata store as sqlite files (the "file manifests" would be sqlite files), which was a ridiculously simple thing because the writers were single machine + single mutex (pinned to a core to reduce jitter) to record metadata. This was also built in the era of spinning rust and specifically for EBS (so there was a lot of group-commit coalescing of requests, which naturally flowed into BEGIN DEFERRED).

Was a joy to debug in a lot situations where I was happy there wasn't some custom protobuf parser needed to join + merge + update the data on disk while it was turned off.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: