Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If we manage to write a spec that meet this criteria we'll have a powerful standard with easy adoption.

So, a binary format consisting of: (1) a text data segment (2) and end of file character (3) a second text data segment with structured metadata describing the layout of the first text data segment, which can be as simple (in terms of meaning; the structure should be more constrained for machine readability) as “It’s some kind of CSV, yo!” to a description of specific CSV variations (headers? column data types? escaping mechanisms? etc.) or even specify that the main body is JSON, YAML, XML, etc. (which would probably often be detectable by inspection, but this removes any ambiguity).



You got my vibe

Almost any current CSV parser, even the bad ones, tolerate a header line.

So it should be possible to define a compact and standardized syntax that is appended before the real header of the first cell (separator,encoding,decimal separator (often disregarded by most parsers but crucial outside USA),quote character,escape character,etc...). Following headers would just use special notation to inform on (data-type,length,comment).

Newest parsers would use theses clues, older ones would just append some manageable junk to headers.


So someone opens this CSV in Excel and there's garbage in A1?

Does this really count as compatible? You will get user bugs for this.


> So someone opens this CSV in Excel and there's garbage in A1?

Yeah, that's why I chose the “thing that looks like a text file—including optionally CSV—but has additional metadata after the EOF mark” approach instead of stuffing additional metadata in the CSV; there's no way to guarantee that existing implementations will safely ignore any added metadata the main CSV body. (My mechanism has some risk in that there are probably CSV readers that treat the file as a binary byte stream and use the file size rather than a text stream that ends at EOF, but I expect its far fewer than will do the wrong thing with additional metadata before the first header.


If by EOF char you mean Ctrl-Z, Python's `csv` module is at least one case where it will read past the EOF char and you'll get rows of garbage data for any content in the file after that.


No, not binary.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: