Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Delta Lake is a marketing term from Databricks to, in part, market their Delta file format and all the clusters you’ll be spinning up.

Delta files are actually an amazing format though. At their most basic, they took Parquet files (columnar database) and let you stream off them really easily. Which takes a lot of complexity out of your pipelines - dont need Kafka for everything, don’t need to figure out when new rows get added (or a whole other set of jobs around that).

But using Delta files really can change the way you develop pipelines (and ML pipelines), so I forgive them for inventing a new term.



As someone that is familiar with what a Lake House Arch is, I remain confused with what Delta Lake is mostly bc I find it difficult to differentiate what is Databricks Delta Lake marketing and the virtues of the Delta Lake Arch (Delta files, etc). It's frustrating, and I've given up...

I am however keeping up to date with Apache Iceberg bc it's much easier to follow and it seems to have a lot of advantages over the Delta Lake external table format (delta files?).

Iceberg seems to be better especially in handling schema evolution and drift. They both seem to use parquet and avro below the surface and generally have the same design, but am I missing anything by dismissing and ignoring Delta Lake?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: