CSV is far from perfect, but it's nice that I can easily work with them without ...

jrumbut · on Aug 19, 2021

The real advantage of CSV, in my mind, is that if the CSV is valid and normal then it's going to be a rectangular dataset (ignoring semantics within the dataset).

If I import JSON data I have no idea what shape the result will be in, and it requires a separate standard to let me know about columns and rows and validation can get complicated.

justinator · on Aug 19, 2021

CSV is that way too. There's nothing that says each row has to have the same number of what maps out to columns, or anything that tells me what those columns really represent (there's no schema). You could use the first line of the CSV doc to say what each tuple is named, but that's not a standard or anything. And without a schema, it certainly could be easy to lose the metadata of the info the CSV file is trying to represent. Is this column just a bunch of numbers or a date format? (for example). CSV is OK for importing and exporting data across systems that know what the format is without the help of a schema, but anything else and you run into a pile of edge cases. Even using a CSV file to import into a spreadsheet works usually but context is often lost.

Frankly, I love the format.

emmelaich · on Aug 19, 2021

It's funny to complain about CSV when JSON is also a minefield: http://seriot.ch/parsing_json.php

CSV is fine .. usually

dheera · on Aug 18, 2021

CSV is still easier to parse because the C++ dudes still refuse to implement some kind of nice operator-overloaded interface like

    #include <json>

    std::json myjson("{\"someArray\": [1,2,3,4,{\"a\": \"b\"}]}");
    std::cout << (std::string)myjson["someArray"][4]["a"];

and the result is we have 50 different rogue JSON libraries instead of an STL solution. Until the STL folks wake up, boost::split can deal with the CSV.

mivade · on Aug 18, 2021

https://github.com/nlohmann/json

dheera · on Aug 18, 2021

ooh this is nice. STL should adopt it

meepmorp · on Aug 18, 2021

Thank you!

RodgerTheGreat · on Aug 19, 2021

A string split function is a poor choice if there's any possibility the CSV file contains quoted fields. Robust handling of both CSV and JSON requires a parser. In my experience, CSV can actually be trickier than JSON to parse because there are so many edge cases, alternatives, and ambiguities.

sgolestane · on Aug 19, 2021

Nice as long as JSON is valid and not too big.

08-15 · on Aug 18, 2021

> All I need is file I/O and the ability to split strings.

...until there is a newline inside a field.

The moronic quoting mechanism of CSV is one half of the problem; people like you, who try to parse it by "just splitting strings" is the other half. The third half is that it's locale dependent and after 30+ years, people still don't use Unicode.

kermatt · on Aug 19, 2021

Never write your own parser, especially as just string.split(), and when possible don't use C(omma)SV formats, but C(haracter)SV, aka DSV: https://en.wikipedia.org/wiki/Delimiter-separated_values

There are non-printable, non-typable characters specifically defined as separators (ASCII 28-31) with UTF equivalents.

7thaccount · on Aug 19, 2021

As printed above, I'm not writing software where that would be a problem. In general if I was writing a commercial application or something in production where I don't control all the inputs I would agree 100%, but I'm lucky to not have those problems. I could use a bunch of libraries and additional code to try to catch non-existent errors I don't have, or fix my actual problems and move on.

I appreciate the perspective though for sure. I'm guilty of the same thing on HN, where I assume people have the same uses as me when they're writing code that has to be extremely robust or blazingly fast.

7thaccount · on Aug 19, 2021

You've assumed an awful lot about my use cases. The data I deal with in .CSV form is always pre-processed and doesn't have any of the minefield occurrences you've mentioned. There can't be a newline or anything like that in an input. In my decade of using .CSV files daily, I've only had one tertiary system where that is a problem.

Also, when doing interactive work, it's a bit different than writing production IT software.

08-15 · on Aug 19, 2021

So you're not actually parsing CSV and your comment was off-topic. Thank you for the clarification.

7thaccount · on Aug 19, 2021

Define parsing. I'm still going through GB of data in thousands of files and building complex reports, and data structures for scientific analysis. Just because I don't need thousands of lines of code to navigate edge cases doesn't mean I'm not parsing .CSV files.