Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: The 10-bit timers are about to overflow on September 17th
400 points by modinfo on Sept 3, 2022 | hide | past | favorite | 75 comments
Due to the overflow of the 10-bit counter, some devices will "go back in time" by 1024 weeks (almost 20 years). This will occur on the night of September 17-18. The problem will affect - and this is now a sure thing - Microsemi's (AKA Symmetricom) SyncServer (100, 200 and 300 series), TimeProvider 1000, parts of TimeProvider 5000 and Timesource TS3x00 (and a few others), which are popular in industrial networks.

Loss of historical data and event logs, logging and security problems, loss of process visualization - these are some of the surprises that can happen when, having received the wrong date, other devices also decide to "time travel."

What to do?

If your network is running one of the aforementioned devices, it's best to disconnect it in advance. Unfortunately, most of them are no longer supported by the manufacturer and no patches are expected. So it looks like they will become quite useless after September 17.

This leaves very little time, therefore, to replace them with new solutions. If you are not sure whether the problem also affects your device, you can unplug it on September seventeenth, and if it shows the correct date the next day, plug it in again. Of course, such a maneuver is possible only in those networks that can operate without time synchronization according to GPS for several/some hours.



This is not the GPS week counter rollover https://en.m.wikipedia.org/wiki/GPS_week_number_rollover

But instead it is a device-specific rollover. It’s relatively common for GPS receivers to handle the limited week counter by having a baked-in epoch that is more recent than the latest rollover, which gives them a nearly 20 year lifetime after that epoch instead of failing at the GPS rollover.


If I'm following, the thought is: It's currently Sept 17th 2002, our counter will roll over in 2019 (17 years). But since we know 1999-2001 has already past, we can hard code logic to add 20 years to any counter that's reporting that range. That updated logic rolls over on Sept 17th 2022, which buys us three extra years. This works for devices that will EOL within 20 years.

I think devices that need a longer life should keep track of the last seen timestamp, then they can detect a rollover. The device would keep a local rollover counter and just add 20 years per count. This should stay correct until that counter rolls over, or if the device is offline for >20 years.


The epoch is the date & time that the minimum value of that field refers to. For Unix derived or inspired machines it’s typically the Unix epoch of 00:00:00 on January 1, 1970. This means that a timestamp of “0” refers to that date and time, and increases at whatever granularity a timestamp value maps to. If you bake the epoch into the device at the factory as being the date of manufacture and your timestamp value has 20 years worth of granularity then you get a 20 year life from manufacture time before the timestamp rolls over to 0 again.

If you’re going to use more space keep a rollover counter you may as well just add more bits to your timestamp field instead, since every bit doubles the amount of time that field can represent.


> If you’re going to use more space keep a rollover counter you may as well just add more bits to your timestamp field instead, since every bit doubles the amount of time that field can represent.

The GPS week counter is part of the signal GPS satellites send and is not easily changed by the manufacturer of receivers.

Of course, devices that accept CNAV now have a few extra bits to work with.


Rollover fields are helpful, see https://news.ycombinator.com/item?id=32700482 for why


> I think devices that need a longer life should keep track of the last seen timestamp, then they can detect a rollover. The device would keep a local rollover counter and just add 20 years per count. This should stay correct until that counter rolls over, or if the device is offline for >20 years.

Better in some ways, worse in others.

Can still result in unexpected behavior if configuration memory is lost.

And if it ever accepts an invalid future time for whatever reason, it gets "stuck" in that failure state.


It is easier than that. You can use unsigned integer wraparound.

Truncate the current wall clock time to a 10 bit unsigned, then subtract the number reported by the device, and interpret the result as an unsigned 10 bit. That's the amount of time that has passed since the device emitted its timestamp. (Assuming at most one wraparound event.)


Counters roll over, this is not new.

What matters is how rollover is dealt with.

For interest, we (mostly) all use GPS .. and the 'onboard' broadcast 10 bit GPS week counter has already rolled over twice:

* midnight (UTC) August 21 to 22, 1999

* midnight (UTC) April 6 to 7, 2019

The GPS "seconds since midnight last Sunday" timer resets to zero every week.

[1] https://en.wikipedia.org/wiki/GPS_week_number_rollover

[2] Satellite Geodesy Günter Seeber https://www.geokniga.org/bookfiles/geokniga-seeber-g-satelli...


Note that GPS week number is now 13 bits long since the introduction of CNAV navigation message format; devices only accepting the original NAV format are now pretty rare.


Sure, the seconds are still lapsed seconds since (weekly) epoch . . .

(Not to mention the post corrections broadcast upwards from the ground to adjust for relativistic time drag between Earth surface nominal 1G and sat orbit height gravity)

Point being, many timing applications use lapsed time since X counters and are part of systems designed to handle rollover.

The mindset of just adding more bits to a counter or recording more decimal points isn't always appropriate or the 'best' fix.


> to adjust for relativistic time drag between Earth surface nominal 1G and sat orbit height gravity

That's just the General Relativity correction. There's also a Special Relativity correction to account for time dilation caused by the satellite's speed, which works in the opposite direction to the GR correction yet the two don't exactly cancel each other. Plus there are some others (Doppler shift, Sagnac effect, etc.)

https://www.aapt.org/doorway/TGRU/articles/Ashbyarticle.pdf


Im confused...were 10-bit timers being used in recent history? Also they're using weeks as their atomic unit?

This seems like something that never should have existed.


The GPS system uses 10 bits for the week number part of the broadcast timestamp. This causes a rollover [1] every 19.6 years, and devices that aren't designed to anticipate it will report the current date/time as being two decades (or multiples of that) in the past.

I guess the ultimate cause is data framing in the transmission protocol. The timestamp contains the number of weeks since 00:00:00 1980-01-06, together with the number of seconds since the start of the current week. The number of seconds in a week won't fit into 16 bits, so my supposition is that the designers had to also use some of the bits that could otherwise have been used for a wider week counter.

I'd like to think that nowadays we'd use self-describing, upgradeable protocols. But GPS was designed in the 70s for the constrained technology of that era. And I'm pretty sure nobody anticipated how widely deployed it would be 45+ years later.

[1] https://en.m.wikipedia.org/wiki/GPS_week_number_rollover


You've missed two key points:

* GPS depends upon acurate timing and positions.

* GPS broadcasts a data packet.

The length of the packet is limited by the broadcast frequency and the time to update .. so the packet was kept tight.

The satellites drift (orbital decay, magnetic torque, solar wind, random particle pressure) and live in a different gravity (aka relativistic time) .. their 'self' data (position | time) is regularly updated from ground stations and their internal epochs only need to operate for some low N number of expected update cycles .. if they lack an accurate sense of 'self' they lack function as waypoints.

Adding clock data widths that count nanoseconds for the lifetime of the sun would be pointless: it makes the data packet longer and it doesn't change the hard requirement for regular time|position updates from the ground.

They were designed with short epoch counters with an understanding of functional constraints.


Still surprised, every time I see it mentioned, to have a cultural construct like a week in the data model. Likely a deliberate break from SI to avoid ambiguity between monotonously increasing counters and calendaric time, but surprising nonetheless.


They were fighting the good fight of whether a week begins on Sunday or Monday… /s


> I'd like to think that nowadays we'd use self-describing, upgradeable protocols.

That would open a whole new can of security worms though. Being able to modify a protocol in-band is something we're starting to move away from. Things are becoming more static as a precaution, like stored procedures on SQL so an attacker can't inject a change.


Stored procedure was the norm when I was a young programmer. Dynamic SQLs are recent and was the cause of SQL injection vulnerabilities.


> And I'm pretty sure nobody anticipated how widely deployed it would be 45+ years later.

Or maybe they thought by now we would have replaced it with something better. Spacefaring was still riding high in the '70s, the brakes were only applied after '89 (sadly one of the many indirect effects of glasnost, RIP Gorbachev).


GPS was originally only for military use; opening it up to civilians didn't come until later. They didn't anticipate all of us pesky civilians coming up with creative new uses for it (like time keeping in our networks) or putting it in millions of devices that don't have strict maintenance schedules.


What do you consider recent history? The design work for GPS started about 50 years ago. Shaving bits off of the L1 transmission also leads to a small but real improvement in acquisition time. Every bit you waste on a counter that only changes every ~20 years means receivers have to wait that much longer to get the latest ephemerides (this is less of an issue nowadays since cell phones can just download almanac and ephemerides from the internet).

Weeks worked nicely as a time unit due to the periodicity of the p-code signal.

If you have any doubts about the competency of the people who designed GPS, IMO it's worth checking out the Woodford/Nakamura report that considered a lot of alternative implementations. From an engineering perspective, they had a lot of tradeoffs to consider but typically took the direction that made the system more user friendly but harder to implement (like not requiring receivers to have atomic clocks or transmitters). It's also interesting to see how that sort of info was presented in the pre-powerpoint days


> This seems like something that never should have existed

The engineers who designed and implemented the GPS system many decades ago were pioneering a fundamental technology, and worked within the constraints at the time to balance efficiency, cost, and complexity deliberately and as best they could.

Now we get to drive around in our luxurious cars, decked out with supercomputers listening to satellite communications, charting our course for us, all so we don't have to read our own maps.


Here are a few links. This is the key one. The units with this problem are obsolete and Microsemi has done very little to publicize this. This means you have a ticking time bomb if you thought everything was good because you got through the 2019 rollover. There are a few select models where a return-to-factory firmware update can help you dodge the problem, but it is probably too late to get something turned around now if this is the first you have heard of it. If all you need is a 10 MHz or 1PPS reference, those bits will continue to function. If you need it for NTP, you've got a problem.

https://sync.empowerednetworks.com/keep-your-ntp-server-curr...

Another link stating "But other GPS receiver manufacturers set a delayed date for the rollover date to occur on – September 18th, 2022". "Other" in this case = Symmetricom / Microsemi.

https://www.orolia.com/will-your-network-time-servers-be-aff...


I'm flying on 17th September. Looking forward to it now.


Same. I hope the Atlantic is warm this time of year.


Too bad they didn't use a float.


It will go swimmingly?


It's called "Global Positioning System" not "Global Time System".

Granted, it's a pretty accurate time system in the sense that it needs to know the time to an accuracy determined by the speed of light crossing the globe: that's how it knows where you are, that's its job.

Day, weeks, months, years... feh! In 7 minutes the Earth transits its own [edit] diameter; in 4 hours, the distance to the moon; in a year it's (roughly) back where it started.

Great hack, but maybe you ought to know what year it is if it matters to you. No way to figure that out? No clock available which is accurate enough?


GPS is widely used as a cheap way to get a timestamp from an atomic clock. Most datacenters will have a GPS receiver to seed their NTP services, for instance.

At the massive scale of GPS, all observable behavior of the system are features that someone is relying on.


People designed new devices with just a 10-bit time counter after Y2K?!


It's something that's inherent in the GPS standard, which was developed considerably before Y2K, and it's something that's very very easy to correct for.

Are you surprised that people are still designing hardware where the time rolls over to 00:00 every 24 hours?


The signal is 10-bit, but it seems silly to make the receiving device 10-bit and suffer this problem when correcting for it is easy. OP says it wasn't always corrected for, even by people with the lesson of Y2K behind them, which is what surprises me.


Every one of the devices failing at this particular date have been 'corrected' for it: They are shifting the wrap point based on their manufacturing date. The natural wrap point isn't now-- the second natural wrap was in 2018.

Correcting for it more elaborately than that is challenging, in particular if you depend on remembering the number of wraps a replacement part pulled off the shelf will return wrong dates when the original was fine, also if you remember then a false (or spoofed) signal can put the device in a bricked state.

Most GPS receivers (including most of the ones listed here) can be temporarily fixed by setting the date manually. But their interfaces may be inaccessible in the devices they're installed in and the fix will be lost after restart.

The L1C/L2C/L5 gps signals use a 13 bit week number. Hardware for the new signals was first deployed with the IIR-M satellites in 2005 with L2C support, which were delayed from something like 1997... but no broadcasting of L2C started until 2014. Right now very few receivers support the new signals.


Luckily the 24 hour clock comes with a rollover counter, which may be confusingly named something like “day.” Consult accompanying documentation for details.

If your hardware vendor did not include a “day” counter, you may need to implement in wetware.


But then that rolls over every seven "days" incrementing the "week" counter, and then either 28, 29, 30, or 31 days in a bizarre pattern incrementing the "month" counter. Then - get this - every 365.25 days it increments the "year" counter, which is just so spectacularly half-assed. Three hundred and sixy-five *and a quarter* days. WTAF.


You could store the number of milliseconds since an arbitrary date in Greenwich UK, and provide methods to get the time, but even this has problems.

The units could be defined by atomic vibrations or by the spinning of a large celestial object but these don’t agree. What is more the large celestial object (and others nearby) distorts space and time adding to the issue of where the time is (see my past submission on Barycentric Time)


My device doesn’t have a day counter but it makes a hell of a buzz in the morning to wake me up and runs off stored mechanical energy


Baked-in obsolescence.

Everyone knew of these issues and has known of these issues for ages. Why would they not address these well-known issues? The desire to sell new devices and/or support is most likely why.


The OG GPS satellites were designed in the 1970s


Ynot2K?


This is maybe a very dumb question but was it so expensive to add several bits (and extend its lifespan exponentially) to what is basically a counter?


I remember how we once ran into trouble with a large timestamp counter in a FPGA implementation. (Was it just 64 bits, or 112 bits? Probably the full PTP timestamp, including fractional nanoseconds for precise drift corrections.)

The extra bits of storage are cheap. The problem is the addition circuitry. With a small counter, you can do addition in a single clock cycle, very easy to implement. With a large counter, addition of the highest bit has to wait for completion of the previous one, etc. so if this takes longer than one clock cycle you have to implement a different, pipelined addition algorithm. (Or run everything at lower clock frequency.)


It can be quite surprising how what might seem like minor differences to the programmer can require major changes to the hardware.

I saw an MITx class on EdX called "Computation Structures" a few years ago and took it for fun. In the second part of that students design at the logic gate level a 32-bit RISC processor, except that we could assume black boxes for the register file and memory.

I considered trying to actually build mine using good old classic 74xx series logic. Mine would have needed 140 quad MUX2 chips, 75 quad AND2 chips, 81 dual MUX4 chips, and 55 quad XOR2 chips, 16 dual D flip flop chips, plus a handful of others.

It was around 370 chips in total.

My design included a shift unit that can do left or right shifts, arithmetic or logical, of 1 to 31 bits in 1 clock cycle.

If I replaced that with a shift unit that could only do 1 bit logical right shift, and changed the instruction set to have a new instruction for that, made the old shift instructions trap, and then emulated them in the trap handler, a whopping 88 of those 140 quad MUX2 chips would no longer be needed.

That would bring it down to around 280 chips. The fancy shifter was almost a quarter of the parts count!


Naive question. Do processors ever have sub-elements that run at a higher clock. I can imagine trying to hack this sort of thing by putting some sort of subprocessor structure that does addition for a particular set of registers at twice normal speed (double length registers?? I'm clearly spitballing). I guess it can't because of memory bandwidth constraints?


> Do processors ever have sub-elements that run at a higher clock

Yes, this is called a "clock domain"; there may be quite a lot of them, and they can often be powered off individually.

> I can imagine trying to hack this sort of thing by putting some sort of subprocessor structure that does addition for a particular set of registers at twice normal speed

It's the other way round: a particular arrangement of logic elements, at a particular size and on a particular wafer process, at a particular temperature, will have a worst-case timing. That timing determines how fast you can possibly clock that part of the circuit.

Adders are annoying because of carry: you can't determine the top bit until you've determined the effect of carry on all the other bits. So if it takes, say, 250ps to propagate through your 32-bit adder, you can clock that at 4GHz. If you widen it to 64 bits that takes 500ps, and now you can only clock that bit at 2GHz.


You may know this, but the person you responded to almost certainly doesn't based on their question, so:

Carry look-ahead adders are a thing. The number of logic levels for computing the highest carry bit is logarithmic in the width of the numbers being added, not linear. Doubling the width of the numbers does not cut your clock rate in half, though you do have to pay for the faster cycle time in added area (more logic gates). There are all sorts of different trade-offs that are possible in the constant terms, but the standard adder designs have linear area in the number of bits, and logarithmic propagation time from inputs to outputs.


> the standard adder designs have linear area in the number of bits, and logarithmic propagation time from inputs to outputs.

It's a linear number of gates, but slightly worse than linear area due to routing. (I've looked for a strictly-linear design before, so if you have such a design, I'd quite appreciate a citation.) It's still much better than N log N area, though, so your point stands.


Admittedly I hadn't considered routing. Do you have some reference, and how does it scale? Worse than linear but better than N log N is unusual...


If you lay out a carry-lookahead tree the naive way, it occupies a N x logN rectangle with the N-bit adder along the N side, but most of the rectangle is empty (each layer has half as many gates as the previous). You can easily pack it more densely on a ad hoc basis, but I don't know of a systematic approach, so possibly it's only a (mildly large) constant factor better than the naive way at scale.

> Do you have some reference

Nope; that's why I was asking for one.


Normally you'll do the opposite: you'll have sub-parts which run at a lower clock rate. The 'core clock' is generally the fastest clock in the system, at least outside of specific high-speed transceivers. The most common approach is to pipeline the operation, which increases latency of the operation but still gives you the same throughput so long as the output does not need to feed back into the input within the operation time.


To add onto this you can easily increase the clock for specific clock domains using a PLL (Phase locked loop) as a clock multiplier.

This comes with all kinds of caveats and complications(PLL clock multiplication is "fuzzy" and subject to phase drift and variations in actual frequency while normal clock division is essentially as close to exact as you can really get). Because of this, like mentioned it's preferable to use clock division instead of clock multiplication where possible.


PLLs are also fairly large analogue circuitry in your otherwise digital logic.


> With a large counter, addition of the highest bit has to wait for completion of the previous one, etc. so if this takes longer than one clock cycle you have to implement a different, pipelined addition algorithm. (Or run everything at lower clock frequency.)

Kogge Stone carry lookahead.

You can calculate the upper bits in parallel using Kogge-stone (which is the predecessor to what GPU programmers call "prefix-sum" parallelism)


> was it so expensive to add several bits

If this is the GPS week number, there is very limited space for it in the GPS navigation message; every extra bit used for the counter means one less bit for everything else. From what I've read, the navigation message is transmitted at only 50 bits per second, and the time including the week number repeats every 6 seconds, so they had only 300 bits to play with. Given that a 10-bit week number already allows for nearly 20 years, a receiver only needs to know which decade it is today for that value not to be ambiguous, and that should be easy for a receiver with a few bits of local storage unless it's been powered off for more than a decade.


> and that should be easy for a receiver with a few bits of local storage unless

unless it gets a single spoofed or corrupted signal, then it's stuck in the future.

Of course you can implement countermeasures, like requiring months of observation to advance the flag, but that makes it much less easy (and harder to test).

> it's been powered off for more than a decade.

That's also a concern. You don't want a receiver to get hit by lightning then get swapped out with a spare on the shelf and suddenly get the wrong date. Now you need a provisioning step that requires remote access to a normally completely embedded component.

Again, not insurmountable-- but it's getting away from easy.


It's probably less about adding a few bits to the counter, and more about replacing the existing counters that are already deployed everywhere in the world. Industrial applications usually go by "if it ain't broke, don't fix it" and stick with decades-old machines as long as they still get the job done.


The question is if it would have been that expensive to make it more than 10 bits to begin with


We as a human race have been terrible at anticipating how fast numbers grow. A similar story is that with databases, we used to think that 32bit primary keys were plenty big enough to store all the numbers we'll ever need. In all likelihood, people who manufactured these timers in the past thought that their equipment would never outlive the need for 10 bits.


There's a difference between not anticipating use of products changing in scale to the point that database PKs might need to be longer (which I would assume, albeit as an uneducated guess, has seen closer to exponential than linear growth in terms of maximum length required for unique IDs in databases), vs. anticipating "this timer will definitely hit a limit in 2022, unless the physics of time as we know it somehow changes before then".

Sure it's still a relatively easy mistake to either wrongly assume that all of whatever you're creating will be in landfill by 2022 if you were making it long enough ago that all devices would be dead by now for other reasons (or to unethically do that intentionally, to cause customers to repurchase products this year - seems pretty unlikely but perhaps...), or to just forget to think about it, or think about it and assume it's fine for now and can be checked later before then forgetting.

But "We as a human race have been terrible at anticipating how fast numbers grow" - we're talking about a number that's growing exactly in sync with time, and with an entirely simple way of checking when the limit will be hit, so I think it's just sloppy/lazy or expecting less longevity of use for their product, more than a flaw of humanity.


100% agreed. When it comes to any kind of incrementing counter, IMO the limit should always be multiple orders of magnitude greater than the expected lifespan of the system.

Taking OP at face value, these devices' counter has a zero point on February 1, 2003 so if we don't know what happens when they roll over they must have been designed after that point.

In the world of the 2000s even 16 bit processing is old school, so there's no good reason this couldn't have been a 16 bit counter instead. If that had been the case the rollover wouldn't be until the year 3259.

IMO this is the correct way to handle situations where you think you don't need any more, round up from your reasonable limit to the next major "bit border". If you have a 10 bit counter, round up to 16. If you need 16, then make it 24 or 32. The limit shouldn't just be "above where we expect it to reach in its lifespan" but "if you hit this the device belongs in a museum".

Designing a device with a 10 bit week counter in the 2000s is bad design, prioritizing either laziness or cost over quality.


> We as a human race have been terrible at anticipating how fast numbers grow.

But in this case it was easy ... it's a timer!


Numbers grow unpredictably—it’s true. People in the 1970’s didn’t anticipate that we’d get to the time Sep 2022. :) How time flies.


> People in the 1970’s didn’t anticipate that we’d get to the time Sep 2022.

No, they just didn't anticipate that a GPS receiver would not know which decade it is (knowing the decade is enough to disambiguate a 10-bit week number) even though it knew the correct date a few moments ago. That is, they didn't anticipate counter rollover bugs caused by hard-coding of the starting point of the counter, instead of calculating it based on the last seen date.


last seen date means a single spoofed signal yields future dates forever.

Give some credit to the engineers who built these receivers and their firmware. They're extremely complex and nuanced. You can be absolutely sure that they carefully considered their options here and chose a tradeoff that made sense given the constrains they were designing for. I'm confident they anticipated them, but they just concluded that the fixes beyond using the mfgr date as a base point were worse than the disease.

It's easy to criticize from the future when we know what happened, e.g. e.g. that there was an almost 20 year delay in deployment of the modernized GPS signals and many people kept using these devices far longer than anticipated.


most of the people in the 1970’s didn’t get to the time Sep 2022!


No. But which way did incentives align?


Are there any references for this? What is the "epoch" here?

1024 weeks before Sept 17 is February 1, 2003. What is significant about that date?

References welcome.


Probably what this comment is suggesting: https://news.ycombinator.com/item?id=32700347

That given the 10-bit limitation, some devices chose their own epoch date to get as much use of the 10-bits as possible. Meaning they chose some arbitrary date earlier than the device manufacture date.


I'm wondering how comprehensive the research is that says it's just those 3 vendors and ~7 devices. Given that it's more than one vendor, it feels like a pattern that's a common mistake or design compromise. I wouldn't be surprised if the impact is broader than expected.


All GPS receivers(1) have this issue but the date they experience it on differs from device to device because to fix the rollover issue they adjust the wrap point in their firmware based on when the firmware was made.

(1) Technically the modernized L2C/L1C/L5 signals use a 13-bit week number, but because the deployment of these signals was massively delayed receivers using them are basically non-existent today. (I also have no idea if the current receivers supporting them successfully use the 13-bit week number, given that they're not guaranteed to be able to receive the modernized signals as they're still less available than L1)


Wouldn't some of them have rollover counters or adjustable epochs, etc?


The epoch is adjustable by updating the firmware... if there is an updated firmware. That's the reason for the current batch of devices dying now: it's based on a date set in their firmware. Unfortunately these are embedded receivers so making anything adjustable isn't necessarily easy.

I'm not aware of any devices that implement a rollover counter but there may be some, though a rollover counter needs to figure out how to deal with corrupted or spoofed signals and will have issues like spares off the shelf not working like the device they're replacing.


Ah, re-reading, it sounds like these are all from one vendor that's known by two names..."Microsemi" and "Symmetricom" (acquisition, etc). So perhaps that risk is low.


What kind of industrial applications? Are we talking about manufacturing or grid/infrastructure?


Anything that uses and depends on time synchronization basically. Which is a huge list. from railways to game servers. But again, do they all use these devices or in particular 10-bit counters? I have no idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: