Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have a go with duckdb next time - you can query csv files without loading them first.


You can do that with SQLite too: https://til.simonwillison.net/sqlite/one-line-csv-operations

(DuckDB is a lot more ergonomic for that kind of thing though - it's really fantastic tech)


Cool! Although I believe duckdb can do it on disk / out of memory, so querying huge files are possible. I also like its syntax, I tend to CREATE VIEW mycsv AS SELECT * FROM ‘my.csv’ (or similar). Then I think you can select or join even across files, although I haven’t gotten that far yet.


Unlike sqlite, DuckDB is a very complex and much less polished.


The spirit/answer being sought of the question is to not have to load all of the data in memory first.


I believe duckdb does not load the whole csv file into memory. It will load a few rows to find column headers and guess data types.


You’re just pettyfogging the situation. The spirit of the question is to find a solution that is acceptable/performant algorithmically.

Certainly, there are hiring panels that appreciate these sorts of tricks to go around the solution, usually citing “out of the box” thinking, but the majority would probably just say “do it without that solution” or mark you as a fail.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: