Hacker News new | past | comments | ask | show | jobs | submit login

>'From the perspective of the typical time_t rendering of Unix time, there is no way to uniquely represent that 61st second. It just "disappears".'

This is a great and really intuitive summary of the problem. Is there a certain class of problem related to disappearing second? Like are these more likely to be filesystem issues or things that rely on timestamps? Or are there second order problems as well?

>"I can tell you that what died at that particular site was all of the locking code that used wall time clocks instead of monotonic clocks. They all CHECKed and died when their preconditions were no longer valid. I can tell you that what died at that particular site was all of the locking code that used wall time clocks instead of monotonic clocks. They all CHECKed and died when their preconditions were no longer valid."

Sorry if this is a silly question but was that check simply that "time t1 is greater than time t0"? Also was the duration of that outage(17 seconds) or would this have been equally catastrophic at a single second?




Well, okay, so, the problem is like this: let's say you wanted to schedule something to happen during that exact extra second in the end of June 2015 when we were all standing around watching UTC do its little extra dance. You pick 1435708800. Trouble is, that Unix time applies to both 23:59:59Z and 23:59:60Z on that particular day.

You can't target it beforehand or after. It's just... gone.

It's not a problem from the point of view of programs, since they just got whatever time_t value they got, and they don't know the bigger perspective. It's more of a mapping from outside->in problem.

Put it another way: try writing a program that'll call clock_gettime() and will say a message at a later time you select. You can't put in 23:59:60Z because there's no way to represent it, and indeed, you won't even be able to tell when the time comes unless you special-case it and notice _that particular second_ repeating itself... or reach into the kernel to look at the leap bit, or worse. It's a real time in meatspace, but you can't target it with the tools at hand. That's the problem.

Regarding the 17 second thing, that's because someone decided to switch off the thing which (correctly) applies the adjustment factor to GPS time to make it NTP time. There was a 17 second difference at the time (GPS to NTP), and with it off, we were shipping GPS time to hosts as if it was NTP time.

In theory, any regression of the clock long enough to not let the actual passing of time push it past the sanity check time point in the lock stuff would have caused this. The thing is, a small-scale time step (from ntpd, say) normally happens at boot up, not later, and it's on a system by system basis.

The 17 second excursion happened on hundreds of thousands of machines all at once, and, yeah, it was noticed.


Great explanations. Thanks!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: