Hacker News new | past | comments | ask | show | jobs | submit login

> I much prefer my tools not to be "90%-smart", but predictable.

I understand your preference, but please recognize that it is not universal. Some of us much prefer a super-simple tool that fails for some particular cases, while requiring special options to be completely general. Thus you can use the heuristic defaults interactively (where you'll notice the errors easily), and write scripts with the more explicit form.




Wouldn't it be wonderful if we actually used ASCII as it was designed?

    Oct   Dec   Hex   Char
    ----------------------------------------
    034   28    1C    FS  (file separator)
    035   29    1D    GS  (group separator)
    036   30    1E    RS  (record separator)
    037   31    1F    US  (unit separator)
https://ronaldduncan.wordpress.com/2009/10/31/text-file-form...


The problem with this is the same problem that CSV has to solve though - there's no escape character specified in ASCII so you can't have a unit that contains any of these 4 characters or else you'll break the parser.


In turn I get what you are saying, but in this case it is not a trivial problem. CSV files seem simple on the surface but there are all sorts of gotchas.

For example, there's plenty of variation between platforms/applications when it comes to just terminating a line. Are we using CR, LF, CR+LF, LF+CR, NL, RS, EOL? What do we do when the source file is produced by an app that uses one approach but doesn't care about the others (allows their occurrence)?

If those others should appear in the data would our "90%-smart" tool make the wrong determination on line termination for the whole file? would everything just break or would this tool churn along and wreck all the data? how long until you noticed?

By my estimation, the "90%-smart" tool would be about 30% dependable unless used only with a known source and format, meaning it wouldn't need to be smart in the first place.


> CSV files seem simple on the surface but there are all sorts of gotchas.

My point is that supporting "general CSV files" is useless. Restricting your tooling to "simple CSV files" is good. But my opinions are not very representative. I also think that it is perfectly acceptable for a shell script to fail badly when it encounters filenames with spaces.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: