Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If your goal is to join large CSV files using a local program, the ideal tool is not SQLite but DuckDB:

https://www.duckdb.org/docs/current/sql/copy.html

The reason why DuckDB is a better fit for this job is because DuckDB is a column store and has a block-oriented vectorized execution engine. This approach is orders of magnitude faster when you’re doing batch operations with millions of rows at a time.

In contrast, SQLite would be orders of magnitude faster that DuckDB when you’re operating on one row at a time.



I've used SQLite quite a few times, but never heard of DuckDB. Can anybody provide some more information about it?


Main author of DuckDB here, I did not expect to see this mentioned here. DuckDB is a relational DBMS geared towards efficiently handling large analytical-style workloads locally. It is similar to SQLite in the sense that it operates locally on your machine, is easy to run and install and has zero dependencies. However, DuckDB uses modern processing paradigms (vectorized processing, columnar storage) that make it much faster when processing large amounts of data.

It's still in an early stage currently, however, most of the functionality is there (full SQL support, permanent storage, ACID properties). Feel free to give it a try if you are interested. DuckDB has Python and R bindings, and a shell based off of the sqlite3 shell. You can find installation instructions here: https://www.duckdb.org/docs/current/tutorials/installation.h...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: