Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CSV is far from perfect, but it's nice that I can easily work with them without needing any libraries. All I need is file I/O and the ability to split strings. It doesn't get much simpler.

I'll admit though that "import JSON" and then being able to essentially convert the entire file into a dictionary is nice if the data has more structure to it.



The real advantage of CSV, in my mind, is that if the CSV is valid and normal then it's going to be a rectangular dataset (ignoring semantics within the dataset).

If I import JSON data I have no idea what shape the result will be in, and it requires a separate standard to let me know about columns and rows and validation can get complicated.


CSV is that way too. There's nothing that says each row has to have the same number of what maps out to columns, or anything that tells me what those columns really represent (there's no schema). You could use the first line of the CSV doc to say what each tuple is named, but that's not a standard or anything. And without a schema, it certainly could be easy to lose the metadata of the info the CSV file is trying to represent. Is this column just a bunch of numbers or a date format? (for example). CSV is OK for importing and exporting data across systems that know what the format is without the help of a schema, but anything else and you run into a pile of edge cases. Even using a CSV file to import into a spreadsheet works usually but context is often lost.

Frankly, I love the format.


It's funny to complain about CSV when JSON is also a minefield: http://seriot.ch/parsing_json.php

CSV is fine .. usually


CSV is still easier to parse because the C++ dudes still refuse to implement some kind of nice operator-overloaded interface like

    #include <json>

    std::json myjson("{\"someArray\": [1,2,3,4,{\"a\": \"b\"}]}");
    std::cout << (std::string)myjson["someArray"][4]["a"];
and the result is we have 50 different rogue JSON libraries instead of an STL solution. Until the STL folks wake up, boost::split can deal with the CSV.



ooh this is nice. STL should adopt it


Thank you!


A string split function is a poor choice if there's any possibility the CSV file contains quoted fields. Robust handling of both CSV and JSON requires a parser. In my experience, CSV can actually be trickier than JSON to parse because there are so many edge cases, alternatives, and ambiguities.


Nice as long as JSON is valid and not too big.


> All I need is file I/O and the ability to split strings.

...until there is a newline inside a field.

The moronic quoting mechanism of CSV is one half of the problem; people like you, who try to parse it by "just splitting strings" is the other half. The third half is that it's locale dependent and after 30+ years, people still don't use Unicode.


Never write your own parser, especially as just string.split(), and when possible don't use C(omma)SV formats, but C(haracter)SV, aka DSV: https://en.wikipedia.org/wiki/Delimiter-separated_values

There are non-printable, non-typable characters specifically defined as separators (ASCII 28-31) with UTF equivalents.


As printed above, I'm not writing software where that would be a problem. In general if I was writing a commercial application or something in production where I don't control all the inputs I would agree 100%, but I'm lucky to not have those problems. I could use a bunch of libraries and additional code to try to catch non-existent errors I don't have, or fix my actual problems and move on.

I appreciate the perspective though for sure. I'm guilty of the same thing on HN, where I assume people have the same uses as me when they're writing code that has to be extremely robust or blazingly fast.


You've assumed an awful lot about my use cases. The data I deal with in .CSV form is always pre-processed and doesn't have any of the minefield occurrences you've mentioned. There can't be a newline or anything like that in an input. In my decade of using .CSV files daily, I've only had one tertiary system where that is a problem.

Also, when doing interactive work, it's a bit different than writing production IT software.


So you're not actually parsing CSV and your comment was off-topic. Thank you for the clarification.


Define parsing. I'm still going through GB of data in thousands of files and building complex reports, and data structures for scientific analysis. Just because I don't need thousands of lines of code to navigate edge cases doesn't mean I'm not parsing .CSV files.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: