How do you compare with DVC and LakeFS?

jtagliabuetooso · on April 16, 2025

Thanks for the question!

On the data side of things, DVC is more about versioning static datasets / local files, while Bauplan manages your entire lakehouse, potentially hundreds of tables with point in time versioning (time travel) and branching (at any given time, different version of the same table) -> https://docs.bauplanlabs.com/en/latest/tutorial/02_catalog.h....

On the compute side of things, Bauplan runs the functions for you, unlike catalogs which only see a partial truth and provide only a piece of the puzzle: Bauplan knows both your code (because it runs your pipeline) and your data (because it handles all the commits on the lakehouse), which allows a one-liner reply to question such as:

"who change, when, with which code, this table on this branch?"

It also allows a lot of optimizations in multi-player mode, such as efficient caching of data (https://arxiv.org/abs/2411.08203) and packages (https://arxiv.org/pdf/2410.17465).