Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can write what "looks" like CSV to you, but there are no guarantees it will import correctly.

The problem is 10x worse when you get CSV from one source and rely on another process to load it. I fought this problem for several days going from NetSuite to Snowflake via CSV.



Can you give an example? The rules for CSV files are so simple I'm struggling to imagine a case where something looks correct but in fact isn't correct.


Non standard delimiters. Escaping delimiters in fields - sometimes with a \, sometimes doubled (""), sometimes not at all. Double new lines.

Poor handling from standard CSV libraries. Either unable to read or unable to create for some downstream process.


That sounds like the problem of badly formatted CSV, not a problem with CSV per se.

If you stick to one delimiter, and that delimiter is a comma, and escape the delimiter in the data with double-quotes around the entry, and escape double quotes with two double-quotes, well, you have written CSV that is correct and looks correct and will be parsed correctly by literally every CSV parser.


Pretty much the rfc. https://datatracker.ietf.org/doc/html/rfc4180

Parsers are trickier if you want to be lenient, but exporters are dead simple.


> That sounds like the problem of badly formatted CSV

That’s what CSV is. That’s what happens when you ingest CSVs whose production you don’t control.

> If you [ignore everything people literally clamour for in these comments and praise csv for]

Yes i also like ponies.


> "That’s what CSV is."

That's really not a serious argument against CSV. Since you paraphrase in a silly way, I can do it too! Your "argument" is "Badly formatted files exist, therefore CSV bad".

Everyone "against CSV" seems to be arguing against badly formatted CSV, and leaping to the conclusion that "CSV is just bad" without much more to say about it. I'm sorry that badly formatted CSV gave you a bad time, but the format is fine and gets its job done.

"It doesn't have x, y or z feature therefore no one should be using it ever" is kind of a dumb argument, honestly.


> Your "argument" is "Badly formatted files exist, therefore CSV bad".

The argument is actually that the badly formatted CSV files have taken over, therefore CSV is bad. You can't reject them, so your import becomes unreliable.


Me, a naive idiot: CSV is simple I will write my own exporter because I am clever

Me, 20 minutes later: Heh that was easy I am a genius

Me, 21 minutes later: Unicode is ruining my life T_T

Don't get me wrong, I really like CSV because it's so primitive and works so well if you are disciplined about it. But it's easy to get something working on a small dataset and forget all the other possibilities only to faceplant as soon as you step outside your front door. In the case above my experience with dealing with CSV data from other people made me arrogant, when I should have just taken a few minutes to learn my way around a mature library.


How does Unicode present a problem?

In UTF-8, the byte for a comma and a quote only exist as their characters. They don't exist as parts of multibyte sequences, by design.

If you have Unicode problems, then you have Unicode problems, but they wouldn't seem to be CSV problems...? Unless you're being incredibly sloppy in your programming and outputting double-byte UTF-16 strings surrounded by single-byte commas and quotes or something...?


  name,position
  "Smith, John"‚Manager



Lots of edge cases that aren't always handled to spec on both sides of the import/export. That's my experience, at least.


how're,you,handling,quotes?'


If you're manually generating your own CSV files, you probably know what kind of data you are generating and consequently whether your data is going to contain commas. If commas and newlines don't exist in your data, then you can safely ignore quoting rules when generating CSV files. I know that I've generated CSVs in the past and rather than figuring out the correct way to quote the strings, I just removed any inconvenient characters without any loss to the data at all. Obviously this is not "correct" but you don't have to implement cases if you know they won't show up.


You use the right ANSI characters - the record and field separators (30,31) - and avoid hacks like comma, pipe, and newline.


This is true, but a lot of data processing takes place in a context where frictionless export functionality is more important than a 100% guarantee of import compatibility. I'd rather ingest city = ",CHANGSHA,HUNAN" (real example!) than ingest nothing at all because my vendor doesn't have time to integrate a JSON serializer.


> I fought this problem for several days going from NetSuite to Snowflake via CSV.

Yeah if you see a CSV import feature without a billion knobs you know you're in for a world of hurt.

If you see a CSV import feature with a billion knobs, you're probably still in a world of hurt.


> but there are no guarantees it will import correctly.

What do you mean, there are "no guarantees"? You are in charge! You know what data you're dumping, you can see if it imports well. You can tailor your use case.

That's not the same as getting a CSV from some dump, where you have limited (if any) control over the behavior.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: