Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's Time to Retire the CSV

> This column obviously contains dates, but which dates? Most of the world

It's time to retire local formats and always write YYYY-MM-DD (which is both the international and the Swedish standard, and the most convenient for parsing and sorting).

> A third major piece of metadata missing from CSVs is information about the file’s character encoding.

It's bloody the time to retire all the character encodings and always use UTF-8 (and update all the standards like ISO, RFC etc to require UTF-8). The last time I checked common e-mail clients like Thunderbird and Outlook created new e-mails in ANSI/ISO codepages by default (although they are perfectly capable of using UTF-8) - this infuriated me.

> If not CSV, then what? ... HDF5

Indeed! Since the moment I discovered HDF5 I wonder why is it not the default format for spreadsheet apps. It could just store the data, the metadata, the formulae, the formatting details and the file-level properties in different dimensions of its structure to make a perfect spreadsheet file. Nevertheless spreadsheet apps like LibreOffice Calc and MS Excel don't even let you import from HDF5.

> An enormous amount of structured information is stored in SQLite databases

Yet still very underused. It ought to be more popular. In fact every time I get CSV data I import it to SQLite to store and process but most of the people (non-developers) have never heard of it. IMHO it also begs to be supported (for easy import and export at least) by the spreadsheet apps. A caveat here is it still uses strings to store dates so the dates still can be in any imaginable format. Fortunately most of the developers use a variation of ISO 8601 conventionally.

And by the way, almost every application-specific file format could be replaced by SQLite or HDF5 for good. IMHO the only cases where custom format make good sense are streaming and extremely resource-limited embedded solutions.



I, too, am a fan of ISO 8601. In an insane world of date formats, it's the only sane choice.

https://en.wikipedia.org/wiki/ISO_8601


Almost nobody is a fan of the actual ISO 8601. It requires a big 'T' to separate the date from the time - people consider this ugly and rarely implement this in the wild. People mostly use RFC 3339 which allows using a space instead of the 'T'.

Both also require colons to separate hours and minutes and this makes it impossible to use in file names if you want to support accessing them from Windows.

I personally use the actual ISO 8601 (with the "T") wherever I can, simple YYYY-MM-DD-HH-mm-SS-ffffff where I need to support saving to the file system (but this is slightly harder for a human to read) and mostly RFC 3339 (with a space instead of the "T") wherever I need to display or to interop with tools written by other people. As for SQLite - I usually store every field (years, months,... seconds etc) in a separate integer column and create a view which adds an automatically generated RFC 3339 date/time column for simpler querying.


Are ISO standards ever updated? Is there any chance we might see an ISO 8601 date variant which is meant for filenames?


> Are ISO standards ever updated?

Yes, ISO standards are updated as regularly as it makes sense for standards to change (obviously they wouldn't really be standards if they were updated more often).

Wikipedia says ISO 8601 was first published in 1988, with updates in 1991, 2000, 2004, and 2019.

The C language is another example. The latest stable revision (C17) is ISO/IEC 9899:2018.

> Is there any chance we might see an ISO 8601 date variant which is meant for filenames?

Hardly.


CSV is accessible to programmers and non-programmers alike. HDF5 and SQLite require some programming ability and special software to interact with the data as they are not just text files like CSV.


This is so just because no common software supports them (I don't know why, the libraries are totally free - BSD and Public Domain).

And by the way, many (if not an overwhelming majority) of the non-programmers don't even understand what does "just text files" actually mean, how do text files differ in nature from DOC files and how are CSV files different from XLS files. They can only use CSV because Excel and LibreOffice support it OOTB and consider CSV just a weird XLS cousin needed for import/export purposes.


I had a great one today. I got a load file in (e-discovery world) that had my sent/rev'd/sort dates in "mm.dd.yyyy hh:mm" while my created and modified dates were "dd.mm.yyyy hh:mm". I can't fathom what piece of software sent that to me, and it wasn't rocket science to fix in Excel. I'm all for standardizing date format but with the # of applications that will only spit out data in one particular format I don't see it ever happening.


Honestly I constantly see dates argued about and people state various formats that are still confusing. 4-2-2 of any variety can be confused. Why not 2-3-4 or 4-3-2 (DD-MMM-YYYY or YYYY-MMM-DD)? I’ve never understood why that isn’t more widely used.


> Why not ... DD-MMM-YYYY

Because sorting. You can just sort a collection of dates stored as YYYY-MM-DD strings alphabetically and the result will always be in accordance with the actual time line.


This has so many happy side effects. Yesterday I was upgrading some code to use a newer version of an API. The old version stored timestamps as int seconds-since-epoch, but the new one stored them as ISO8601 strings. I replaced a line of code like:

  if row['timestamp'] < cutoff_time_as_int: ...
with:

  if row['isotimestamp'] < cutoff_time_as_string: ...
and everything kept working exactly as before. There was no expensive and error prone date parsing involved, some some simple string comparison. It wasn't quite as cheap as comparing ints, but it's not performance critical code so eh, good enough.


This is the answer I needed.

Thanks!


What would MMM look like? 02-005-2022?

Usually MMM refers to the 3-letter shorthand of the month, e.g. "APR" or "OCT", but I guess that's not what you meant because it couldn't be used internationally.


I do mean with the letters, but international issues makes sense.

I should know better but just never thought about it.


YYYY-MM-DD with numeric months sorts in order. Nothing else does.


If someone gives you a date like 10–04-2021 you need to know whether they’re American or not. Because it could be October 4th, or April 10th.

(In theory people could write yyyy-dd-mm, but I’ve never seen anyone actually do that).


I don't need a library to work with CSV in any language. That alone is a deal breaker is plenty of situations, no matter how widespread the format becomes.


Isn't importing and using the SQLite library utterly trivial in modern versions of the most of the programming languages? Even in Fortran and COBOL it's just `use :: sqlite` and `set proc-ptr to entry "sqlite3.dll"` respectively.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: