Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is no “true CSV”. https://en.wikipedia.org/wiki/Comma-separated_values:

“The CSV file format is not fully standardized. Separating fields with commas is the foundation, but commas in the data or embedded line breaks have to be handled specially. Some implementations disallow such content while others surround the field with quotation marks, which yet again creates the need for escaping if quotation marks are present in the data.

The term "CSV" also denotes several closely-related delimiter-separated formats that use other field delimiters such as semicolons.[2] These include tab-separated values and space-separated values. A delimiter guaranteed not to be part of the data greatly simplifies parsing.

Alternative delimiter-separated files are often given a ".csv" extension despite the use of a non-comma field separator. This loose terminology can cause problems in data exchange. Many applications that accept CSV files have options to select the delimiter character and the quotation character. Semicolons are often used instead of commas in many European locales in order to use the comma as the decimal separator and, possibly, the period as a decimal grouping character.”

https://en.wikipedia.org/wiki/Comma-separated_values#Standar... mentions a few standards for csv, one of which is the MIME type text/csv, standardized in RFC 4180.



I think it's valid to argue that you shouldn't be able to put some of commas, quotes, and newlines inside fields at all. And comma versus semicolon.

But that doesn't extend to using backslash escapes in something that's legitimately trying to be CSV. That's someone getting confused and implementing a mix of data formats, or trying to be clever and making an extended CSV format.


It’s valid to argue that, but that means you can’t use CSV for many real-world data sets.

That, in turn, means you almost cannot use CSV in any robust solution. Even if, today, your input doesn’t have commas, quotes or newlines, can you guarantee it won’t tomorrow, next year, etc?


> Even if, today, your input doesn’t have commas, quotes or newlines, can you guarantee it won’t tomorrow, next year, etc?

But... but those are the ones I listed as real special characters, unlike backslash. I don't understand the question.


Yes? What do I not know that makes this question harder than it seems to me?


Every field is quoted. Every field is quote-escaped. That's all you need.


There are simpler and better approaches. Fortunately for us all, IP datagrams are not JSON-encoded, for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: