> Changes in format, different partitioning strategies, schema changes – through...

snthpy · on May 30, 2024

I think what was implied was that the user just references the view so as long as you update the view in concert with the data format change then the user is none the wiser about the change.

robertlagrant · on May 31, 2024

Oh - I see. Because the .db file is centralised, you can connect to it from somewhere and do stuff to it. I think I missed that the DB is like that.

Is it like dumping a SQLite database somewhere with a view in it, and connecting over that as well? Or does DuckDB have more magic to transfer less data in the query work?

snthpy · on May 31, 2024

Yeah, I guess you could equivalently put a SQLite database with a view or virtual table in S3 which would give you the same level of indirection and API abstraction provided by this mechanism.

Where DuckDB will have an advantage is in the processing speed for OLAP type queries. I don't know what the current state of SQLite Virtual Tables for parquet files on S3 is, but DuckDB has a number of optimisations for it like only reading the required columns and row groups through range queries. SQLite has a row oriented processing model so I suspect that it wouldn't do that in the same way unless there is a specific vtable extension for it.

You can get a comparable benefit for data in a sqlite db itself with the project from the following blogpost but that wouldn't apply to collections of parquet files: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

robertlagrant · on June 1, 2024

Yes - makes sense. I imagine I'd only do this with sqlite on a sqlite database, rather than on heterogenous data sources.