Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always use sqlite3 for working with large CSV files:

    $ sqlite3
    $ .mode csv
    $ .import my.csv foo
    $ SELECT * FROM foo WHERE name = 'bar';
It reads the header in automatically for the field names and then stores all the values as strings.


Not bad. Here is the Windows equivalent with Excel separator:

    PS> gsudo choco install sqlite3

    PS > sqlite3
    sqlite> .mode csv
    sqlite> .separator ; \n
    sqlite> .import my.csv foo
    sqlite> SELECT * FROM foo WHERE name = 'bar';
For CSV that contain up to several 100k records native PowerShell solution is way faster and more practical:

    Get-Content my.csv | ConvertFrom-Csv | ? name -eq bar
FYI, I maintain miller choco package: https://community.chocolatey.org/packages/miller


Great tip. A simple utility script can be used for repeated use/customization: https://pastebin.com/qEkimk7W

Another advantage is that this can be executed to import as part of initialization and get to sqlite prompt with the table ready for sql execution:

    $ csv_to_table.sh sample-test.csv
Or it can be executed using heredoc for non-interactive execution as well:

    $ csv_to_table.sh sample-test.csv <<-CMD
    select * from sample_test limit 10;
    CMD
    ... output ...


This stack exchange answer makes that into a fairly simple bash alias: https://unix.stackexchange.com/questions/424555/how-to-inser...


I do the same. I wrote[0] a little wrapper around this workflow that does a few additional niceties: 1) normalizes column names and removes whitespace, 2) supports importing multiple csv files without importing the header line as a record.

[0]: https://github.com/psanford/csv2sqlite


holy moly, that's rad. definitely adding this to my toolbox, thanks!


duckdb is another good option as well.


[flagged]


If there is anything ugly here, it's your communication.

Aside from the fact that OP's example can run in a non-interactive mode, it makes absolutely no difference to the discussion.


You can't understand that there are people who know SQL and not R?


I don't think the comment refers to people, just to better ways of doing things. People learn.


In my experience people rarely learn anything after you insult them and call their work ugly.

If AP had a "better way of doing things" maybe they should have actually given us an example of what it looks like.

Right now, all I think about R is that snobby, gatekeeping assholes use it to do things I can already do easily in SQL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: