Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you’re trying to use SQLite for this, it’ll be much faster if you import the data first, and then create your indexes.

Also check out DuckDB, which is a similar tool tailored for data science; I haven’t used it but I’ve heard good things. https://duckdb.org/



The other tricks to making SQLite fast:

* Ensure you're in WAL mode. https://sqlite.org/wal.html

* Use prepared statements and transactions.

* Set the cache pragma to ideally hold the DB in RAM.

* Run optimize and vacuum periodically (especially if the data is transactional): https://sqlite.org/pragma.html#pragma_optimize

* Pick the synchronous mode you're comfortable with: https://sqlite.org/pragma.html#pragma_synchronous


> Ensure you're in WAL mode.

That was what I thought as well, but in this case non-WAL turned out to be faster. Not yet sure why that is.


Dunno why it'd be surprising: WAL has higher concurrency but it translates to more overhead, writes have to hit the WAL then be flushed to the database, and reads have to check the database and the WAL.

WAL could have faster writes for small amounts of data, but once you have enough contents that the WAL has to be flushed during processing you're better off skipping the WAL entirely.


That indexing perf isn't unique to SQLite; it's true for PostgreSQL too.


It's true for pretty much any rdbms - mostly an artifact of the b-tree implementation.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: