These are similar problems: getting a computer-amenable model of something that's fundamentally a human phenomenon, and carries multiple centuries and multiple continents of accumulated context, ambiguity, and edge-cases. This is an extremely difficult class of problems, but in the case of text & characters, we managed to finally get a pretty well-functioning and broadly-supported solution after a few decades of gratuitously-incompatible half-solutions. There's hope.
For a serialisation format, I like the Ts, because it sometimes makes munging data with cut and awk and so on slightly easier. I'd agree it's not a great display format, but then I don't think it's supposed to be.
Right, and it's distinctive, which sneaks type information past all the gremlins in your pipeline determined to strip away any and every bit of context they find in your data. On the whole I think it's probably for the best, but it is ugly.
It's pretty horrible. Apart from being hard to read (why pick an uppercase later that looks similar to numerals?), it only works for full dates or times or datetimes. You can't have literals for just years, or just hours, or just a a particular month. Even 10:20 is ambigious -- is that twenty past 10 (am), or 10 minutes 20 seconds past the hour, or a duration (iso also covers durations, but they are weird and clunky)?
Compare 2020-7-29T19:45 to Chinese style:
2020年7月29日19時45分
Easier to read and you can pick any subset of components unambiguously.
ISO-8601 does not allow replacing the T with a space. You are thinking of RFC 3339 “Date and Time on the Internet: Timestamps”, a profile of ISO-8601 that does allow replacing the T with a space.
> NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.
Almost everything should be capable of parsing and outputting ISO 8601 dates [1] today. About the biggest recommendation that still seems to lacking in documentation is that you should probably consider offsets (Z or -0500 etc) required rather than "optional", and that you have to remember that offsets are not timezones (and that generally you should store offsets as presented rather than convert between offsets; makes it easier to adjust offsets based on timezones).
> consider offsets (Z or -0500 etc) required rather than "optional"
ISO8601 without an offset is semantically different to one with an offset, it represents a time in local timezone (context-dependent). It isn't an "optional" offset in the sense that you can just omit it, it is a fundamentally different data type.
Without this distinction, there is no way to specify a local time in ISO8601, which would be highly inconvenient for certain applications. For example, how do you represent an event that occurs at 9am every day regardless of location? After all, dates and times are used for more than just storing absolute timestamps.
You are absolutely correct that offsets are also not timezones, which makes the ability to specify local "floating" times even more important (i.e. you can't just denormalize the above concept into a list of timestamps with offsets for each timezone you care about, as the offsets will change over time [edit: and tz->offset conversion is lossy and not reversible]).
personally I think the mistake is trying to fit both purposes into a single format or type. as you state, it's a fundamentally different data type. representing a fixed moment in time is hard. representing a "floating" time of day is hard. trying to do both with the same type is pretty insane, but unfortunately common.
It wouldn't really be much different if we had ISO8601.1 and ISO8601.2, would it? The standard can define two different types, and it makes it clear what it means to have a time without an offset specifier. It also defines formats for durations, intervals, and repeating intervals.
Is the distinction in representation too subtle?
We accept this subtlety elsewhere; we are used to 0 and "0" being different things, and expect them to behave differently under operators. Few would find the following surprising:
0 + 0 == 0
"0" + "0" == "00"
Then why would we expect these to behave the same?
Good point, which is why ISO 8601 does have a very different duration format (though the options for it such as calendar weeks make it a lot more obvious why it isn't just HH:MM:SS).
Maybe there should be a "float" offset marker. Another thing I was reminded of is that +0000 offset should be Z (UTC), but it is often application dependent if -0000 offset is also Z as some applications use -0000 for "user local time, regardless of user". Which is related to "floating", but yet another semantic difference.
I haven't run into that use of -0000, that's insane.
I suspect a side-effect of libraries lacking support for the full range of ISO8601 (for example, refusing to parse a value without an offset, forcing people to use such hacks).
Wouldn't surprise me; iOS doesn't even have a consistent way to parse ISO8601 values with and without milliseconds.
Negative zero offset is invalid in ISO8601, but valid and carries the meaning you describe in RFC3339. I misread what that meaning actually is from your comment - it is different to a "floating" or unqualified local time, it always represents a UTC time but with an unknown local time offset.
]
From what I can tell, there's no library to parse the full range of ISO 8601 in Python. The standard string that people claim is "just ISO 8601" ("YYYY-MM-DDTHH:mm:ss.sssZ") is not full ISO 8601 compliance.
ECMAScript specifies "simplified extended ISO 8601", which is just that string above.
Reminds me of some of my difficulty convincing old C# developers to break years of DateTime habits and move on to DateTimeOffset. One round trips ISO 8601 directly and the other uses "magic" (DateTimeKind) that is sometimes (likely) wrong.
That also reminds me that .NET documentation likes to pedantically remind me that what I think of as ISO 8601 is probably more specifically IETF RFC 3339, which defines a formal BNF grammar as ISO didn't think to do that in the 80s. (Yay, standards.)
Python adds a nice twist on making time parsing confusing. It has two strptime implementations. There is time.strptime and datetime.strptime. The former has much more limited timezone handling than the later.
It took me quite a while staring at Stack Overflow answers and my code, wondering why the SO answers supposedly worked and my code did not, before I realized I had time.strptime and they were using datetime.strptime.
Not every point in time refers to a timezone.
If I write that 1 january 1970 was an eventful date, I can't/dont want to add a timezone to it. Then you may want to store/format times without dates, etc...
One would think, but it's inconsistent unfortunately. Most platforms have input and output of ISO 8601 but they are less interoperable than you would expect. PHP's claimed ISO 8601 parsing is completely inadequate; it consists of a single format string. Postgres's ISO 8601 output is, by their own admission, broken: https://www.postgresql.org/docs/12/datatype-datetime.html#DA...
> Almost everything should be capable of parsing and outputting ISO 8601 dates [1] today.
Why is it so hard to configure a Linux system (its locale settings) to get RFC 3339 formatted dates everywhere? Sure, I can `ls -l --time-style=full-iso`, `git log --date=iso`, `date --iso-8601=s`, `date --rfc-3339=s` and change the configuration of every single application by hand. Am I missing something?
Often, people recommend the hackish `LC_TIME=en_DK.UTF-8`. This however, doesn't work for Java applications and caused various other issues.
These are similar problems: getting a computer-amenable model of something that's fundamentally a human phenomenon, and carries multiple centuries and multiple continents of accumulated context, ambiguity, and edge-cases. This is an extremely difficult class of problems, but in the case of text & characters, we managed to finally get a pretty well-functioning and broadly-supported solution after a few decades of gratuitously-incompatible half-solutions. There's hope.