Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Changes in format, different partitioning strategies, schema changes – through all of it the receiver’s view remains the same.

I don't understand this - if I start saving those files in a different format how will it continue to work? Why would the view remain the same if I just rename columns, even?



I think what was implied was that the user just references the view so as long as you update the view in concert with the data format change then the user is none the wiser about the change.


Oh - I see. Because the .db file is centralised, you can connect to it from somewhere and do stuff to it. I think I missed that the DB is like that.

Is it like dumping a SQLite database somewhere with a view in it, and connecting over that as well? Or does DuckDB have more magic to transfer less data in the query work?


Yeah, I guess you could equivalently put a SQLite database with a view or virtual table in S3 which would give you the same level of indirection and API abstraction provided by this mechanism.

Where DuckDB will have an advantage is in the processing speed for OLAP type queries. I don't know what the current state of SQLite Virtual Tables for parquet files on S3 is, but DuckDB has a number of optimisations for it like only reading the required columns and row groups through range queries. SQLite has a row oriented processing model so I suspect that it wouldn't do that in the same way unless there is a specific vtable extension for it.

You can get a comparable benefit for data in a sqlite db itself with the project from the following blogpost but that wouldn't apply to collections of parquet files: https://phiresky.github.io/blog/2021/hosting-sqlite-database...


Yes - makes sense. I imagine I'd only do this with sqlite on a sqlite database, rather than on heterogenous data sources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: