Those are the simple things to worry about, it can even get more complicated in the real world. So complicated that writing our own parser was the only way to go.
So what are some of the more advanced challenges:
* Banking systems that treat CSV export just as some kind of line based dump file with no regard of consistent formatting. If the banking backend was updated, the format might change within a file from one line to the next
* Some misconfigured data dumping pipeline parses CSV the wrong way from another system and emits it in escaped form again in a different format. For instance putting a complete line escaped into a single field. Your parser has to detect that there is a CSV embedded within a CSV.
* Dumping Pipeline treats \r\n as all kinds of silly lines and re-embedding those lines with quotes around them to look like real data
* Inventing completely new specs of what CSV could mean
* Use mainframe character-sets from the last millennium
So what are some of the more advanced challenges:
* Banking systems that treat CSV export just as some kind of line based dump file with no regard of consistent formatting. If the banking backend was updated, the format might change within a file from one line to the next
* Some misconfigured data dumping pipeline parses CSV the wrong way from another system and emits it in escaped form again in a different format. For instance putting a complete line escaped into a single field. Your parser has to detect that there is a CSV embedded within a CSV.
* Dumping Pipeline treats \r\n as all kinds of silly lines and re-embedding those lines with quotes around them to look like real data
* Inventing completely new specs of what CSV could mean
* Use mainframe character-sets from the last millennium