> I much prefer my tools not to be "90%-smart", but predictable. I understand yo...

chasil · on Aug 25, 2021

Wouldn't it be wonderful if we actually used ASCII as it was designed?

    Oct   Dec   Hex   Char
    ----------------------------------------
    034   28    1C    FS  (file separator)
    035   29    1D    GS  (group separator)
    036   30    1E    RS  (record separator)
    037   31    1F    US  (unit separator)

https://ronaldduncan.wordpress.com/2009/10/31/text-file-form...

svieira · on Aug 25, 2021

The problem with this is the same problem that CSV has to solve though - there's no escape character specified in ASCII so you can't have a unit that contains any of these 4 characters or else you'll break the parser.

memetomancer · on Aug 25, 2021

In turn I get what you are saying, but in this case it is not a trivial problem. CSV files seem simple on the surface but there are all sorts of gotchas.

For example, there's plenty of variation between platforms/applications when it comes to just terminating a line. Are we using CR, LF, CR+LF, LF+CR, NL, RS, EOL? What do we do when the source file is produced by an app that uses one approach but doesn't care about the others (allows their occurrence)?

If those others should appear in the data would our "90%-smart" tool make the wrong determination on line termination for the whole file? would everything just break or would this tool churn along and wreck all the data? how long until you noticed?

By my estimation, the "90%-smart" tool would be about 30% dependable unless used only with a known source and format, meaning it wouldn't need to be smart in the first place.

enriquto · on Aug 25, 2021

> CSV files seem simple on the surface but there are all sorts of gotchas.

My point is that supporting "general CSV files" is useless. Restricting your tooling to "simple CSV files" is good. But my opinions are not very representative. I also think that it is perfectly acceptable for a shell script to fail badly when it encounters filenames with spaces.