The perils of transition to 64-bit time_t

codys · 2024-09-28T18:12:02 1727547122

There are a few options for Gentoo not discussed in the post, possibly because for Gentoo they would be a larger amount of work due to the design of their system:

1. Allow building against packages without installing them. The core issue here is that Gentoo package build and install happen as a single step: one can't "build a bunch of things that depend on one another" and then "atomically install all the build items into place". This means that Gentoo can easily be partially broken when one is doing updates when an ABI change occurs (modulo so versioning, see next option). This issue with 64-bit time_t is an example of an ABI change that folks are very aware of and is very widespread. It's also an example of something that causes an ABI change that isn't handled by the normal `.so` versioning scheme (see next option).

2. Extend the normal `.so` versioning to capture changes to the ABI of packages caused by packages they depend on. Normally, every `.so` (shared object/library) embeds a version number within it, and is also installed with that version number in the file name (`libfoo.so.1.0.0`, for example, would be the real `.so` file, and would have a symlink from `libfoo.so` to tell the linker which `.so` to use). This shared object version is normally managed by the package itself internally (iow: every package decides on their ABI version number to enable them to track their own internal ABI breakages). This allows Gentoo to upgrade without breaking everything while an update is going on as long as every package manages their `.so` version perfectly correctly (not a given, but does help in many cases). There is a process in Gentoo to remove old `.so.x.y.z` files that are no longer used after an install completes. What we'd need to do to support 64-bit time_t is add another component to this version that can be controlled by the inherited ABI of dependencies of each `.so`. This is very similar in result to the "use a different libdir" option from the post, but while it has the potential to set things up to enable the same kinds of ABI changes to be made in the future, it's likely that fixing this would be more invasive than using a different libdir.

akira2501 · 2024-09-28T19:09:45 1727550585

> Allow building against packages without installing them.

A partial staged update system would work the best. I should be able to schedule the build of several new packages, have them built into a sandbox, and have new compilations use the sandbox first and then fallback to the system through a union, and then once everything is build finally package up all the pieces and then move them out of the sandbox and into the system at large.

You could make all gentoo updates transactional that way which would be a huge boon in many other ways.

account42 · 2024-10-02T13:27:14 1727875634

You definitely don't want all upgrades to be transactional, at least not without being able to reuse partial results. E.g. having to rebuild an llvm update because some other package failed to build would suck.

codys · 2024-10-08T17:30:31 1728408631

Presumably one wouldn't throw away the built packages if some other build fails just because we're delaying installing the packages to `/` until later.

aaronmdjones · 2024-09-28T19:15:14 1727550914

Gentoo already allows option 1 by specifying a directory to ultimately install the finished image to (normally it is set to /) [1]. One can do a complete rebuild of everything in @system and @world, install it to the specified subdirectory, and then sync it all over in one shot. Preferably you would do this from a live session, although in theory you could also bind-mount / onto a subdirectory of the place you reinstalled everything to, chroot into it, and then sync it (what is now /) into the bind-mounted real upper /.

[1] https://devmanual.gentoo.org/ebuild-writing/variables/#root

codys · 2024-09-28T20:21:18 1727554878

It's true that Gentoo has some pieces of what they need to build an implementation of option 1 (build packages that depend on other packages before completeing an install), but it's currently the case that that is not what happens when one runs `emerge` (the gentoo packaging build/installing tool), as you've noted one would need to write scripts that wrap emerge (or do the same work manually) to attempt to accomplish this today.

I suspect that using bind mounts and overlays (to allow the "building packages" chroot a view of the "installed" root) could be used to accomplish this, or alternately some filesystem snapshotting features if we're thinking about this from the "external to emerge" angle. (It's my understanding that for Gentoo to do this ABI change, though, they need some solution integrated into emerge).

To some extent, this kind of potential model also reminds me of systems that integrate their package updating with filesystem snapshotting to allow rollbacks and actually-atomic upgrades of many files. I think one or more of the solaris distributions did this?

aaronmdjones · 2024-09-28T23:51:08 1727567468

> but it's currently the case that that is not what happens when one runs `emerge` (the gentoo packaging build/installing tool)

This would be a simple matter of changing /etc/portage/make.profile to point to a profile that uses the new ABI and setting (and exporting) the ROOT variable in the shell you type "emerge" into (or add this ROOT=/whatever/ line to /etc/portage/make.conf), is what I was getting at. There's no need to write any wrappers.

codys · 2024-09-29T00:17:50 1727569070

Simply changing `ROOT` doesn't magically enable us to have `emerge` update our normal `ROOT=/`. There's extra steps here (the syncing, the bind mounting, etc) for the `emerge` with a custom `ROOT` to do an update to `/` itself.

And that's if we have a `ROOT` that allows us to source dependencies (libraries to link against, includes, etc). Currently it's unclear to me how exactly `ROOT` works, probably meaning there needs to be a bit more documentation there and potentially indicating that it might not be used much right now.

aaronmdjones · 2024-09-29T01:19:18 1727572758

> There's extra steps here (the syncing, the bind mounting, etc) for the `emerge` with a custom `ROOT` to do an update to `/` itself.

Those extra steps are handled by telling people what to do, in news items. This happened for example in the 17.1 -> 23.0 profile migration [1]; emerge wasn't updated to help people do this in any way whatsoever, users are expected to do everything themselves. This is the Gentoo way.

> And that's if we had a `ROOT` that allowed us to source dependencies (libraries to link against, includes, etc).

You don't need to source anything in ROOT because everything you're rebuilding is already in /.

If you intended to build a program that depended upon a library you did not already have (in /), you're right, this wouldn't work. That isn't the case here. The dynamic linker (ld-linux.so or whatever you fancy, the ELF interpreter, the thing that runs before main(), the thing that will look at the list of dependent libraries, try to load them, and balk at any mismatches) isn't executed until the program is, and setting ROOT merely tells portage where to put them; it doesn't execute them. This I alluded to in the other comment thread about building a userland that your CPU can't even execute.

[1] https://www.gentoo.org/support/news-items/2024-03-22-new-23-...

codys · 2024-09-29T01:31:10 1727573470

Huh. If it's OK to have instructions like those for users to run through, then I guess I don't understand what's holding up switching to 64-bit time_t for gentoo. Seems like they could make a new profile and tell folks to deal with it. Perhaps I don't understand what the requirements are here, which might indicate that the requirements aren't clear for others too.

rini17 · 2024-09-28T22:42:47 1727563367

The third paragraph says following, which means it's not suitable solution to build whole system:

When building a package, ROOT should not be used to satisfy the required dependencies on libraries, headers files etc. Instead, the files on the build system should be specified using /.

aaronmdjones · 2024-09-28T23:47:20 1727567240

Yes, that means it will look in the host system (/) for those dependencies. But when you're also recompiling and installing those dependencies to /whatever/ as well anyway, and /whatever/ is eventually going to end up being /, this is fine. The original / doesn't get touched until everything is done, and /whatever/ is all built to the new ABI, which it can be precisely because nothing in /whatever/ is getting consulted or used during the rebuild process.

codys · 2024-09-29T00:25:51 1727569551

No, what rini17 pointed out is a problem here, and I think my previous comments that setting ROOT might be "part of a solution" are less accurate (as it seems `ROOT` is even less of a complete solution here).

Lets consider this hypothetical representation of the 64-bit time issue:

In `ROOT=/`, we have a `libA.so` with 32-bit time_t, and a `libB.so` with 32-bit time_t that depends on `libA` (ie: `libB.so` -> `libA.so`).

We rebuild package `A` with a special root (`ROOT=/tmp/new`) which has 64-bit time_t. We now have a `/tmp/new` with `libA.so` built for 64-bit time_t. We now want to build `B` for `ROOT=/tmp/new` with 64-bit time_t and have it link against the new `/tmp/new/lib/libA.so`.

But setting `ROOT=/tmp/new` doesn't cause us to use the `/tmp/new` to as our source to link against things. So (as it currently stands) we'll end up trying to link `libB.so` (64-bit time_t) against `/lib/libA.so` (32-bit time_t), and things just won't work.

If gentoo changes/extends ROOT (or adds something else like ROOT but used to locate dependencies for linking), then it could be a piece of a solution. But it sounds like it's pretty far from that in intention/practice right now.

aaronmdjones · 2024-09-29T00:33:31 1727570011

> But setting `ROOT=/tmp/new` doesn't cause us to use the `/tmp/new` to as our source to link against things

That's a good thing, because it won't be /tmp/new/ when we're finished.

> So (as it currently stands) we'll end up trying to link `libB.so` (64-bit time_t) against `/lib/libA.so` (32-bit time_t), and things just won't work.

It won't work until /tmp/new/ becomes /, which it doesn't have to until you're done.

Remember that this is just for building and installing the software. It doesn't get executed which is precisely what you want because it wouldn't work properly otherwise, as you point out.

You can even use this method to build a complete userland for an instruction set architecture that your CPU doesn't even support. All you need do is also set CHOST to the tuple identifying the target architecture in addition to setting ROOT for where it should be installed [1]. You just wouldn't be able to chroot into it obviously.

[1] https://wiki.gentoo.org/wiki/Embedded_Handbook/General/Intro...

codys · 2024-09-29T01:26:18 1727573178

Before getting into the details here about why we need to read `libA.so` at build time: Perhaps the documentation gentoo provides about `ROOT` that was noted by rini17 is incorrect/misleading and `ROOT` is used as the source of build time dependencies automatically and the documentation was intended for ebuild writers and not emerge users? (in which case, Gentoo should fix/clarify it).

"link" has 2 meanings:

1. As you've pointed at, at runtime we have linking occur that reads files.

2. Additionally, building software with `gcc`/etc, we tell it what libraries it needs to link against (`-lA`). When we do that, `gcc` as part of the build process searches for `libA.so` (in various locations dependent on its configuration and arguments), reads the `libA.so` file, and uses that info to determine what to emit in the object file that `gcc` is building. So we need to read a shared library. The same is true for headers used at build time. How much this actually ends up mattering varies depending on architectural details, and whether libraries have different symbols exposed (or generate different header files) depending on what the bit-width of time_t is. It may well be that for most packages we could have the build-time linking occur against a 32-bit time_t object as long as runtime linking is done to the 64-bit time_t object (assuming one is using identical package versions or at least versions with a compatible ABI).

aaronmdjones · 2024-09-29T01:39:53 1727573993

> and whether libraries have different symbols exposed (or generate different header files) depending on what the bit-width of time_t is

Hmm. You have a good point there; I hadn't thought about that...

Perhaps they can take an existing stage3 with 32-bit time_t, build a new stage1 through 3 with a 64-bit time_t, and compare the symbols and header files between them to identify any potential trouble spots. This would help them regardless of what migration course of action they decide upon.

Any software people have installed that isn't included in the stage3 (e.g. desktop window managers and everything else on top) wouldn't be included here and so is a potential pain point, but everything in @system would be, so at that point you can just rebuild @world again to fix everything.

account42 · 2024-10-07T07:49:05 1728287345

Gentoo already does retain old libraries until all dependends are updated so encoding the abi change in places where ABI changes are supposed to be encoded (architecture and SONAME) would solve this problem.

eschaton · 2024-09-28T21:36:09 1727559369

The way this was handled on Mac OS X for `off_t` and `ino_t` might provide some insight: The existing calls and structures using the types retained their behavior, new calls and types with `64` suffixes were added, and you could use a preprocessor macro to choose which calls and structs were actually referenced—but they were hardly ever used directly.

Instead, the OS and its SDK are versioned, and at build time you can also specify the earliest OS version your compiled binary needes to run on. So using this, the headers ensured the proper macros were selected automatically. (This is the same mechanism by which new/deprecated-in-some-version annotations would get set to enable weak linking for a symbol or to generate warnings for it respectively.)

And it was all handled initially via the preprocessor, though now the compilers have a much more sophisticated understanding of what Apple refers to as “API availability.” So it should be feasible to use the same mechanisms on any other platform too.

AshamedCaptain · 2024-09-28T23:06:30 1727564790

This does not solve the main issue as explained by TFA, which is that now applications that use a different "compiled OS version" cannot link with each other anymore. Your application X which declares to run on OS v.B cannot link anymore with application Y which declared to run on OS v.A , even when both are running under OS v.B .

In fact, what you describe is basically what every platform is doing.... as doing anything else would immediately break compatibility with all current binaries.

kazinator · 2024-09-29T01:47:37 1727574457

The problem is that secondary libraries, that are not glibc, will not have multiply defined functions for different size of off_t and will not have the switch in their header file to transparently select the correct set of functions based on what the client program wants.

Yet, somehow the article emphasizes that this is more of a problem for time_t than off_t.

This is believable and it's brobably it's because time_t is more pervasive. Whereas off_t is a POSIX thing involved in a relatively small number of interfaces, time_ t is ISO C and is all over the place.

On top of everything, lots of C code assumes that time_t is an integer type, equal in width to int. A similar assumption for off_t is less common.

_kst_ · 2024-09-29T06:55:44 1727592944

Code that assumes time_t is the same width as int is already broken, and won't work on typical 64-bit systems were int is 32 bits and time_t is 64 bits.

In any case, I'm not sure I've ever seen any such code.

Netch · 2024-10-07T13:31:16 1728307876

> Code that assumes time_t is the same width as int is already broken

You missed the point. It is meant that code _built for the same target_ "assumes" all instances use the same definition of time_t, and not about such a guarantee across different targets. That's why Gentoo solution to redefine target is cumbersome but finally working - as radical as axe.

eschaton · 2024-09-29T03:26:35 1727580395

Since the different sizes are ultimately a platform thing, they need to support multiple variants on those platforms, or limit their support to a new enough base set of APIs that they can rely on 64-bit types being there.

38 · 2024-09-29T01:26:20 1727573180

your comment has the tone of this being an elegant solution, but it reads to me like an awful hack. typeless macros are a nightmare I never want to deal with again.

eschaton · 2024-09-29T02:10:45 1727575845

These are very straightforward and do in fact serve as an elegant solution.

smackeyacky · 2024-09-29T02:45:16 1727577916

Elegant maybe in the 1980s before compilers got properly complicated. Now they’re a weird anachronism.

I’m old enough to have worked on c++ when it was a precompiler called cfront and let me tell you trying to work out whether your macro was doing something weird or the cfront processor did something weird was very frustrating. I swore off macros after that.

eschaton · 2024-09-29T03:19:44 1727579984

This is a case where you’re seeing the word “macro” and reacting to that when it’s really not warranted. It’s not using macros for complicated code generation, just selecting from a couple alternatives based on one or two possible compiler arguments.

I’m also old enough to remember CFront and this isn’t that.

ykonstant · 2024-09-29T15:00:01 1727622001

I am not an expert in these things, but the discussion here reminds me of https://news.ycombinator.com/item?id=41182917

eschaton · 2024-09-30T05:02:50 1727672570

And yet it shouldn’t because it’s not and doesn’t need to be anywhere near that degree of complexity.

This is exactly the sort of thing I mean when I said the previous respondent was reacting to the use of the term ‘macro’ rather than their actual complexity.

db48x · 2024-09-28T22:06:24 1727561184

Lol, that only works if you can force everyone on the platform to go along with it. It is a nice solution, but it requires you to control the c library. Gentoo doesn’t control what libc does; that’s either GNU libc or MUSL or some other thing that the user wants to use.

eschaton · 2024-09-28T22:23:17 1727562197

It’s entirely opt-in, Apple doesn’t force it. If you just do `cc mything.c -o mything` you get a binary whose minimum required OS is the version of the SDK you built it against, just as with any other UNIX-like OS. It’s just giving the developer the option to build something they know will run on an earlier version too.

And since it was all initially done with the preprocessor rather than adding knowledge to the compilers, there’s no reason individual libraries can’t handle API versioning in exactly this way, including things like differing `ino_t` and `off_t` sizes.

dmitrygr · 2024-09-28T22:13:40 1727561620

> requires you to control the c library

Which is why basically every other same operating system does that. BSDs, macOS, WinNT. Having the stable boundary be the kernel system call interface is fucking insane. And somehow the Linux userspace people keep failing to learn this lesson, no matter how many times they get clobbered in the face by the consequences of not learning it.

AshamedCaptain · 2024-09-28T23:09:36 1727564976

> no matter how many times they get clobbered in the face by the consequences of not learning it.

While I do think that the boundary should be set at the libc level just out of design cleanliness, I fail to see what the "consequence" of not doing so is. You're just changing the instruction binaries use for syscalls from a relative jump to a trap (or whatever), but you still have all the problems with data type sizes, function "versions" and the like which are what this discussion is about.

dmitrygr · 2024-09-28T23:28:02 1727566082

> I fail to see what the "consequence" of not doing so is.

TFA ?

AshamedCaptain · 2024-09-29T00:08:20 1727568500

No. How would it benefit TFA _at all_?

For all practical purposes Linux currently has _two_ stable boundaries, libc (glibc) and kernel. If you move it so that the stable boundary is only the kernel, you still have this problem. If you move it so that the stable boundary is only libc, you still have this problem.

In fact, TFA's problem comes from applications passing time_ts around, which is strictly a userspace problem, and the syscall interface is almost entirely ortogonal. Heck, the 32-bit glibc time_t functions probably use the 64-bit time_t syscalls these days...

clhodapp · 2024-09-28T22:58:58 1727564338

Making the stable boundary be C headers is insane!

It means that there's not actually any sort of ABI, only a C source API.

kazinator · 2024-09-29T01:51:23 1727574683

Headers can do the magic to select the right ABI more or less transparently, based on preprocessor symbols which indicate the selection: is a certain type 32 or 64.

This is similar to what Microsoft did in Win32 with the W and A functions (wide char and ascii/ansi). You just call MessageBox and the header file will map that to MessageBoxA or MessageBoxW based on whether you are a UNICODE build.

The character element type in the strings accepted by MessageBox is TCHAR. That can either be CHAR or WCHAR.

Once the code is compiled, it depends on the appropriate ABI: MessageBoxA or MessageBoxW; MessageBox doesn't exist; it is a source-level fiction, so to speak.

westurner · 2024-09-29T01:28:16 1727573296

cibuildwheel builds manylinux packages for glibc>= and musl because of this ABI.

manylinux: https://github.com/pypa/manylinux :

> Python wheels that work on any linux (almost)

Static Go binaries that make direct syscalls and do not depend upon libc or musl run within very minimal containers.

Fuschia's syscall docs are nice too; Linux plus additional syscalls.

asveikau · 2024-09-28T23:25:38 1727565938

> WinNT

This isn't true. There is an msvcrt in the OS, but it's mainly there for binaries that are part of Windows. The CRT is released as part of Visual Studio, out of band from the Windows release schedule.

Although CRT's place is in the layering a little different, because of so many things talking directly to Windows APIs.

dmitrygr · 2024-09-28T23:27:01 1727566021

IIRC, the CRT in windows makes NO system calls. Those are all in ntdll.dll, and in ntdll.dll ONLY

asveikau · 2024-09-28T23:36:23 1727566583

That's just a legacy of the strategy where NT native APIs were considered a private API, and you were meant to code against Win32 instead, to target both NT and 9x.

Syscalls are in ntdll, layered above that is kernel32 (today kernelbase), then most user mode code including CRT is above that. In 9x kernel32 were syscalls, in NT they were user mode shims above ntdll.

That's for most things. Things like gdi have kernel entry points too afaik.

Anyway my point is that the C library is developed out of band from that.

db48x · 2024-09-28T22:40:28 1727563228

It’s extra work, but I don’t know that it is necessarily insane. If libc was under the complete control of the kernel developers, then that gives other languages fewer options. Go famously (or infamously) uses certain syscalls without going through libc, for example. Sometimes the choices made for the C library just aren’t compatible with other languages. Frankly the C library, as it exists today, is insane. Maybe the solution is to split it in half: one for the syscalls and another for things like strings and environment variables and locales and all the other junk.

mort96 · 2024-09-28T23:43:12 1727566992

Splitting libc into one "stable kernel API" part and one "library with things which are useful to C programmers" part would honestly make a ton of sense. OSes which declare their syscall interface to be unstable would be more convincing if they actually provided a stable API that's intended for other consumers than C.

AshamedCaptain · 2024-09-29T00:16:54 1727569014

Frankly I don't see the point of complaining that libc is not useful for non-C consumers. Sure, there are some ancillary functions in libc you'll likely never call. But what's the issue with making syscalls through libc? Again, the only difference is what the very last call instruction is. If your runtime can marshal arguments for the kernel, then it surely can marshal arguments for libc. They are almost always practically the same ABI.

And you can avoid the need for VDSO-like hacks which is basically the kernel exposing a mini-libc to userspace.

db48x · 2024-09-29T15:45:42 1727624742

This is the reason why Rust compiles every program statically, except that the resulting “static” binary still has to link against libc.

account42 · 2024-10-02T14:05:28 1727877928

Graphical programs also have to link against libGL which also means linnking against libc, same for many other system libraries that make no sense to reimplement in your NIH language.

eschaton · 2024-09-28T22:26:41 1727562401

And yet it would still work out for Linux if musl, glibc, et al just adopted `API_VERSION_MIN` and `API_VERSION_MAX` macros themselves, it doesn’t actually have to be handled entirely at the `-isysroot` level.

grantla · 2024-09-28T19:01:20 1727550080

> A number of other distributions such as Debian have taken the leap and switched. Unfortunately, source-based distributions such as Gentoo don’t have it that easy.

For Debian it was extremely painful. A few people probably burned out. Lots of people pointed to source-based distributions and said "they will have it very easy".

Denvercoder9 · 2024-09-28T21:34:38 1727559278

> For Debian it was extremely painful.

Do you have any references that elaborate on that? From an outsider perspective, the time64 transition in Debian seemed to have been relatively uncontroversial and smooth. Way better than e.g. the /usr-merge.

jlarocco · 2024-09-28T22:54:26 1727564066

https://lwn.net/Articles/812767/

https://wiki.debian.org/ReleaseGoals/64bit-time

I'm continually impressed by Debian. My Debian systems are pretty boring, so maybe it's to be expected, but as an end-user neither the /usr merge or time64 transition broke anything for me, and I'm using Testing.

cbmuser · 2024-09-29T18:36:12 1727634972

> For Debian it was extremely painful.

I did the transition for m68k, powerpc and sh4 and partially for hppa with the help of some other Debian Developers here and there and I'm still alive ;-).

ajsnigrutin · 2024-09-28T22:31:23 1727562683

> For Debian it was extremely painful. A few people probably burned out. Lots of people pointed to source-based distributions and said "they will have it very easy".

'easy' in a way "just tell the user to rebuild everything in one go"? :)

darkhelmet · 2024-09-29T04:14:59 1727583299

Every time I see people struggling with this, I am so incredibly glad that I forced the issue for FreeBSD when I did the initial amd64 port. I got to set the fundamental types in the ABI and decided to look forward rather than backward.

The amd64 architecture did have some interesting features that made this much easier than it might have been for other cpu architectures. One of which was the automatic cast of 32 bit function arguments to 64 bit during the function call. In most cases, if you passed a 32 bit time integer to a function expecting a 64 bit time_t, it Just Worked(TM) during the platform bringup. This meant that a lot of the work on the loose ends could be deferred.

We did have some other 64 bit platforms at the time, but they did not have a 64 bit time_t. FreeBSD/amd64 was the first in its family, back in 2003/2004/2005. If I remember correctly, sparc64 migrated to 64 bit time_t.

The biggest problem that I faced was that (at the time) tzcode was not 64 bit safe. It used some algorithms in its 'struct tm' normalization that ended up in some rather degenerate conditions, eg: iteratively trying to calculate the day/month/year for time_t(2^62). IIRC, I cheated rather than change tzcode significantly, and made it simply fail for years before approx 1900, or after approx 10000. I am pretty sure that this has been fixed upstream in tzcode long ago.

We did have a few years of whackamole with occasional 32/64 time mixups where 3rd party code would be sloppy in its handling of int/long/time_t when handling data structures in files or on the network.

But for the most part, it was a non-issue for us. Being able to have 64 bit time_t on day 1 avoided most of the problem. Doing it from the start was easy. Linux missed a huge opportunity to do the same when it began its amd64/x86_64 port.

Aside: I did not finish 64 bit ino_t at the time. 32 bit inode numbers were heavily exposed in many, many places. Even on-disk file systems, directory structures in UFS, and many, many more. There was no practical way to handle it for FreeBSD/amd64 from the start while it was a low-tier platform without being massively disruptive to the other tier-1 architectures. I did the work - twice - but somebody else eventually finished it - and fixed a number of other unfortunately short constants as well (eg: mountpoint path lengths etc).

jabl · 2024-09-29T08:01:30 1727596890

AFAIU all 64-bit Linux ports have used 64-bit time_t, off_t, ino_t from the beginning. All this is about transitioning 32-bit Linux to 64-bit time_t.

Netch · 2024-10-07T13:36:45 1728308205

FreeBSD was also more radical in dealing with off_t which became 64 bits since 2.0. Linux still has rudiments of old size in its 32-bit versions.

> One of which was the automatic cast of 32 bit function arguments to 64 bit during the function call.

AFAIU this works only for unsigned arguments (as loading to %edi clears upper part of %rdi). SysV ABI spec for x86-64 says nothing about extending all values in registers or stack to full 64-bit value, and comment on boolean (only LS byte contains something significant) suggests this is a common rule.

nwallin · 2024-09-30T02:11:59 1727662319

> Every time I see people struggling with this, I am so incredibly glad that I forced the issue for FreeBSD when I did the initial amd64 port. I got to set the fundamental types in the ABI and decided to look forward rather than backward.

Hold on. You're saying that when amd64 happened, FreeBSD ported i386 time_t to 64 bits as well? That's wild. Were other 32 bit architectures ported to 64 bit time_t as well, like the Motorola 68000 and sparc32?

nobluster · 2024-09-28T18:20:53 1727547653

For a large legacy 32 bit unix system dealing with forward dates I replaced all the signed 32 bit time_t libc functions with unsigned 32 bit time_t equivalents. This bought the system another 68 years beyond 2038 - long after I'll be gone. The downside is that it cannot represent dates before the unix epoch, 1970, but as it was a scheduling system it wasn't an issue.

If legacy dates were a concern one could shift the epoch by a couple of decades, or even reduce the time granularity from 1 second to 2 seconds. Each alternative has subtle problems of their own. It depends on the use case.

suprjami · 2024-09-28T22:40:39 1727563239

If you can change the whole system from signed to unsigned, why not change to 64-bit?

Dylan16807 · 2024-09-29T05:47:03 1727588823

The main problem here is gradual upgrades. Signed and unsigned code act exactly the same on dates before 2038, so you can upgrade piece by piece at your leisure.

The suggestions for messing with epoch or the timescale wouldn't work nearly as well, for those just switch to 64 bit.

suprjami · 2024-09-29T21:07:18 1727644038

It sounds like the parent poster controls the whole system, so there is no gradual upgrade.

If they're changing signed time to unsigned time, it seems to me they can change the size of time as well.

seanhunter · 2024-09-29T06:39:27 1727591967

In the original BSD man pages, the "Bugs" section for "tunefs" had the famous joke "You can tune a file system, but you can't tune a fish." but according to "Expert C Programming"[1], the source code for this manpage had a comment next to the joke saying

   > Take this out and a UNIX Demon will dog your steps from now
   > until the time_t's wrap around.

Obviously back in the 70s when that was written 2038 seemed unimaginably far in the future.

[1] https://progforperf.github.io/Expert_C_Programming.pdf

kccqzy · 2024-09-28T17:42:25 1727545345

My biggest takeaway (and perhaps besides-the-point) is this:

> Musl has already switched to that, glibc supports it as an option. A number of other distributions such as Debian have taken the leap and switched. Unfortunately, source-based distributions such as Gentoo don’t have it that easy.

While I applaud their efforts I just think as a user I want to be over and done with this problem by switching to a non source-based distribution such as Debian.

wtallis · 2024-09-28T17:51:11 1727545871

It sounds like the difficulty for source-based distributions comes from trying to do an in-place upgrade that makes incompatible changes to the ABI. So changing to an entirely different distribution would be at least as disruptive (though possibly less time consuming) as doing a clean install of Gentoo using a new ABI.

akira2501 · 2024-09-28T19:13:06 1727550786

Starting a few years ago I partition all my drives with two root partitions. Once is used and the other is blank. For precisely this reason. Sometimes it's easier to just roll out a brand new stage3 onto the unused partition, build an entirely new root, and then just move over to that once it's finished.

The bonus is you can build your new system from your old system using just chroot. Very convenient.

jasomill · 2024-09-28T19:32:01 1727551921

Several OSes I use on a daily basis (FreeBSD, Fedora CoreOS, and Fedora Kinoite) adopt related strategies as part of their regular (binary) update processes:

https://wiki.freebsd.org/BootEnvironments

https://coreos.github.io/rpm-ostree/

naitgacem · 2024-09-29T09:18:22 1727601502

Android adopted an approach in a similar spirit.

https://source.android.com/docs/core/ota/ab

ordu · 2024-09-28T18:08:41 1727546921

There is an easy way to deal with this in Gentoo. Boot from usb or something, run mkfs.ext4 (or .whatever fs you use) on your / and /usr partitions, mount them, unpack stage3 on them, chroot into them and run `emerge $all-my-packages-that-where-installed-before-mkfs`.

You can install new copy of Gentoo instead of upgrading it incrementally.

seanhunter · 2024-09-30T06:58:29 1727679509

This doesn't take into account any binaries that are in users home directories or locally installed. So for example anything compiled as part of a python virtualenv for a developer or anything they built in rust using cargo etc.

Yes there are ways to deal with those things, but it doesn't change the fact that this transition is a big deal.

account42 · 2024-10-02T14:15:01 1727878501

In theory.

In practice there will be build failure, configation in /etc and sysmlinks managed outside the package manager (e.g. via eselect) that can affect builds and much more. Just reinstall is only simple right after you first installed the system.

robin_reala · 2024-09-28T18:21:19 1727547679

“easy”

ordu · 2024-09-28T18:30:43 1727548243

It is easy. It is just a tl;dr version of "how to install gentoo". :)

lnxg33k1 · 2024-09-28T18:34:40 1727548480

I hope I will get a day where I have the will to recompile everything in the next 14 years, I think I had the same install for the past 10 or so on the gentoo desktop. The only think I reinstalled recently has been the laptop to switch from arch to sid

account42 · 2024-10-07T08:29:18 1728289758

Wow you bother with a separate install (and different distro for you laptop). When I got a new laptop last year I just copied over my desktop install and adjusted things as needed. Setting everything up from scratch is too much work.

viraptor · 2024-09-28T21:10:46 1727557846

> by switching to a non source-based distribution such as Debian.

The distinction has more nuance. Source based distros like nixos don't have the same issue. The problem is more in how Gentoo builds/installs the packages than in building from source.

Also with third party closed source software, you're still going to have issues, even on binary systems. Actually you could even have issues with the first party packages if they're installed in separate independent steps.

n_plus_1_acc · 2024-09-28T17:44:52 1727545492

I'm no expert on C, but I was under the impression that type aliases like off_t are introduced to have the possibility to change then later. This clearly doesn't work. Am I wrong?

wtallis · 2024-09-28T17:53:16 1727545996

Source vs binary compatibility. Using typedefs like off_t mean you usually don't have to re-write code, but you do have to re-compile everything that uses that type.

rblatz · 2024-09-28T19:51:10 1727553070

But isn’t the point of a source only distribution like Gentoo, to build everything yourself? Who is running gentoo but also lugging around old precompiled stuff they don’t have the source for?

mananaysiempre · 2024-09-28T20:02:13 1727553733

As the post describes, the problem is that on Gentoo you can’t really build everything then switch binaries for everything, or at least that’s not what happens when you update things the usual way.

Instead, dependencies are built and then installed, then dependents are built against the installed versions and then installed, etc. In the middle of this process, the system can technically be half-broken, because it could be attempting to run older dependents against newer dependencies. Usually this is not a real problem. Because of how pervasive time_t is, though, the half-broken state here is potentially very broken to the point that you may be unable to resume the update process if anything breaks for any reason.

ordu · 2024-09-28T17:56:05 1727546165

It kinda work, but not in a source based distro. If you can atomically rebuild @world with changed definition of off_t, then there will be no problem. But source based distro doesn't rebuild @world atomically. It rebuild one package at time, so there would be inconveniences like libc.so has 64-bit off_t, while gcc was build for 32-bit off_t, so gcc stops working. Or maybe bash, coreutils, make, binutils or any other package that is needed for rebuilding @world. At this point you are stuck.

So such upgrade needs care.

jasomill · 2024-09-28T18:55:12 1727549712

As someone whose nearest exposure to a "source based distro" is FreeBSD, this sounds like madness, as it means a broken build could not only impair attempts to repair the build, but render the system unbootable and/or unusable.

And as someone who regularly uses traditional Linux distros like Debian and Fedora, the idea that a package management system would allow a set of packages known to be incompatible with one another to be installed without force, or that core package maintainers would knowingly specify incorrect requirements, is terrifying.

While I'm not familiar with Gentoo, my reading of this article suggests that its maintainers are well aware of these sorts of problems and that Gentoo does not, in fact, suffer from them (intentionally; inevitably mistakes happen just as they occasionally do on the bleeding edge of FreeBSD and other Linux distros).

AshamedCaptain · 2024-09-28T23:12:03 1727565123

FreeBSD does not use package management for the core system (i.e. freebsd itself, not ports). Gentoo does.

Athas · 2024-09-28T18:02:27 1727546547

Why not essentially treat it as a cross compilation scenario? NixOS is also source based, but I don't think such a migration would be particularly difficult. You'd use the 32-bit off_t gcc to compile a glibc with 64-bit off_t, then compile a 64-bit off_t gcc linked against that new glibc, and so on. The host compiler shouldn't matter.

I always understood the challenge as binary compatibility, when you can't just switch the entire world at once.

codys · 2024-09-28T18:36:17 1727548577

Nixos has it easier here because they don't require packages to be "installed" before building code against them. For Gentoo, none of their build scripts (ebuilds) are written to support that. It's plausible that they might change the embuild machinery so that this kind of build (against non-installed packages) could work, but it would need investigation and might be a difficult lift to get it working for all packages.

"treat it as a cross compilation scenario" is essentially what the post discusses when they mention "use a different CHOST". A CHOST is a unique name identifying a system configuration, like "x86_64-unknown-linux-gnu" (etc). Gentoo treats building for different CHOSTs as cross compiling.

andrewaylett · 2024-09-28T19:46:59 1727552819

NixOS isn't the same kind of source-based. At some level, even Debian could be said to be source based: there's nothing stopping you from deciding to build every package from source before installing it, and obviously the packages are themselves built from source at some point.

NixOS sits between Debian and Gentoo, as it maintains an output that's capable of existing independently of the rest of the system (like Debian) but is designed to use the current host as a builder (like Gentoo). Gentoo doesn't have any way to keep individual builds separate from the system as a whole, as intimated in the article, so you need to work out how to keep the two worlds separate while you do the build.

I think what they're suggesting winds up being pretty similar to what you suggest, just with the right plumbing to make it work in a Gentoo system. NixOS would need different plumbing, I'm not sure whether they've done it yet or how but I can easily imagine it being more straightforward than what Gentoo is needing to do.

tadfisher · 2024-09-28T20:08:58 1727554138

There absolutely will be problems with different Nix profiles that aren't updated together; for example, if you update some packages installed in your user's profile but not the running system profile. But this is common enough with other glibc ABI breakage that folks tend to update home and system profiles together, or know that they need to reboot.

Where it will be hell is running Nix-built packages on a non-NixOS system with non-ABI-compatible glibc. That is something that desperately needs fixing on the glibc side, mostly from the design of nss and networking, that prevent linking against glibc statically.

oasisaimlessly · 2024-09-28T23:02:56 1727564576

> Where it will be hell is running Nix-built packages on a non-NixOS system with non-ABI-compatible glibc.

This isn't a thing. Nix-built binaries all each use the hardcoded glibc they were built with. You can have any number of glibc's simultaneously in use.

codys · 2024-09-28T17:58:15 1727546295

> there would be inconveniences like libc.so has 64-bit off_t

glibc specifically has support for the 32-bit and 64-bit time_t abi simultaneously.

From the post:

> What’s important here is that a single glibc build remains compatible with all three variants. However, libraries that use these types in their API are not.

ordu · 2024-09-28T18:11:21 1727547081

Yeah, glibc was a bad example. Probably libz.so or something would be better.

codys · 2024-09-28T18:39:29 1727548769

Yep, agreed. Though it does expose an option here which would be "have glibc provide a mechanism for other libraries (likely more core/widely used libs) to support both ABIs simultaneously.

Presumably if that had been done in glibc when 64-bit time_t support was added, we could have had multi-size-time ABI support in things like zlib by now. Seems like a mistake on glibc's part not to create that initially (years ago).

Though if other distros have already switched, I'd posit that perhaps Gentoo needs to rethink its design a bit so it doesn't run into this issue instead.

o11c · 2024-09-28T19:35:12 1727552112

> have glibc provide a mechanism for other libraries (likely more core/widely used libs) to support both ABIs simultaneously

The linked article is wrong to imply this isn't possible - and it really doesn't depend on "GLIBC provide a mechanism". All you have to do is:

* Compile the library itself with traditional time_t (32-bit or 64-bit depending on platform), but convert and call time64 APIs internally.

* Make a copy of every public structure that embeds a time_t (directly or indirectly), using time64_t (or whatever contains it) instead.

* Make a copy of every function that takes any time-using type, to use the time64 types.

* In the public headers, check if everything is compiled with 64-bit time_t, and if so make all the traditional types/functions aliases for the time64 versions.

* Disable most of this on platforms that already used 64-bit time_t. Instead (for convenience of external callers) make all the time64 names aliases for the traditional names.

It's just that this is a lot of work, and little benefit to any particular library. the GCC 5 std::string transition is probably a better story than LFS, in particular the compiler-supported `abi_tag` to help detect errors (but I think that's only for C++, ugh - language without room for automatic mangling suck).

(minor note: using typedefs rather than struct tags for your API makes this easier)

AshamedCaptain · 2024-09-29T01:03:20 1727571800

This is a nightmare to implement (unless you go library by library and do it by hand, or you have significant compiler help), as e.g. functions may not directly use a struct with time_t but indirectly assume something about the size of the struct.

Note that for example the std::string transition did not do this.

jasomill · 2024-09-28T19:18:36 1727551116

This sounds like something similar to NeXTSTEP/macOS fat binaries, only with the possibility of code sharing between "architectures".

I like it, though it sounds like something that'd be unlikely to see adoption in the Linux world without the endorsement of multiple major distros.

codys · 2024-09-29T00:38:34 1727570314

Fat binaries (packing together 2 entirely different objects and picking one of them) could potentially work here, though I suspect we'd run into issues of mixed 32-bit/64-bit time_t things.

Another option that's closer to how things work with LFS (large file support) (mostly in the past now) is to use create different interfaces to support 64-bit time_t, and pick defaults for time_t at compile time with a macro that picks the right impl.

Also possible could be having something like a version script (https://sourceware.org/binutils/docs/ld/VERSION.html) to tell the linker what symbols to use when 64-bit time_t was enabled. While this one might have some benefits, generally folks avoid using version scripts when possible, and would require changes in glibc, making it unlikely.

Both of those options (version script extention, and LFS-like pattern) could allow re-using the same binary (ie: smaller file size, no need to build code twice in general), and potentially enable mixing 32-bit time_t and 64-bit time_t code together in a single executable (not desirable, but does remove weird link issues).

jasomill · 2024-09-29T18:04:45 1727633085

I was thinking along the lines of a simple extension to the Mach-O fat binary concept to allow each member of the fat binary archive to be associated with one or more ABIs rather than just a single architecture.

Then all the time_t size-independent code would go into a member associated with both 32-bit and 64-bit time_t ABIs, and all the time_t size-dependent code would be separately compiled into single-ABI members, one for each ABI.

For both executables and libraries, name conflicts between members associated with the same ABI would be prohibited at compile/link time.

The effective symbol table for both compile time and runtime linking would then be the union of symbol tables defined by all members associated with the active ABI.

Language-level mechanisms that allow compilers to recognize and separately compile time_t-dependent symbols would be required, ideally implemented in ways that allow such dependence to be inferred in most if not all cases.

While compilers would be free to size-optimize code by splitting functions into dependent and independent subparts, I see no immediate reason why this needs to be explicitly supported at link level.

Finally, as I imagine it, mixing 32- and 64-bit time_t ABIs would be prohibited — implicitly, as 64-bit time_t symbols aren't ever visible to 32-bit time_t code and vice versa — with code that needs to support non-native time_t values (for I/O, IPC, etc.) left its own devices, just like code dealing with, e.g., non-native integer and floating-point formats today.

Admittedly this sounds like a lot of unnecessary work vs. similar alternatives built on existing multi-arch mechanisms, but it's still an interesting idea to contemplate.

codys · 2024-10-08T17:36:11 1728408971

> Finally, as I imagine it, mixing 32- and 64-bit time_t ABIs would be prohibited — implicitly, as 64-bit time_t symbols aren't ever visible to 32-bit time_t code and vice versa — with code that needs to support non-native time_t values (for I/O, IPC, etc.) left its own devices, just like code dealing with, e.g., non-native integer and floating-point formats today.

It's pretty easy to imagine having both 32-bit time_t and 64-bit time_t in a single "executable" as long as the interfaces actually in use between 32-bit-time and 64-bit-time components don't use `time_t` (or derived types).

iow: if the fact that `time_t` is 32-bits is kept entirely internal to some library A used by some library B (by virtue of library A not having any exposed types with time_t that are used by library B, and library A not having any functions that accept time_t or derived types), there's nothing preventing mixing 32-bit-time_t code and 64-bit-time_t code in a single executable/process (in this theoretical case where we use a LFS (ala _FILE_OFFSET_BITS, etc) for time_t).

LFS had/has the same capability for mixing (with off_t being the type in question there).

codys · 2024-09-28T18:43:50 1727549030

nixos is an example of a distro that is source based and can do the atomic rebuild here because it has a way to build packages against other packages that aren't "installed". In nixos, this is because there are very few things that get "installed" as the global, only thing to use. But one could imagine that Gentoo could build something that would allow them to at least build up one set of new packages without installing them and then install the files all at once.

ArsenArsen · 2024-09-28T18:13:30 1727547210

this is just as much of a problem on binary distros mind you. just because there might be a smaller time delta does not mean that the problem is solved

jandrese · 2024-09-28T19:07:02 1727550422

It's only the first step of the puzzle. And arguably only a half step. As the article points out anytime that off_t is stuffed into a struct, or used in a function call, or integrated into a protocol, the abstraction is lost and the actual size matters. Mixing old and new code, either by loading a library or communicating over a protocol, means you get offsets wrong and things start crashing. Ultimately the changeover requires everybody to segregate their programs between "legacy" and "has been ported or at least looked over", which is incredibly painful.

smueller1234 · 2024-09-28T17:54:56 1727546096

They make it easier, but just at a source code level. They're not a real (and certainly not full) abstraction. An example that'll be making it obvious: if you replace the underlying type with a floating point type, the semantics would change dramatically, fully visible to the user code.

With larger types that otherwise have similar semantics, you can still have breakage. A straightforward one would be padding in structs. Another one is that a lot of use cases convert pointers to integers and back, so if you change the underlying representation, that's guaranteed to break. Whether that's a good or not is another question, but it's certainly not uncommon.

(Edit: sibling comments make the same point much more succinctly: ABI compatibility!)

jonathrg · 2024-09-28T17:54:39 1727546079

It does work, the problem with ABI changes is that when you change it, you have to change it everywhere at the same time. By default there's nothing stopping you from linking one library built using 32-bit off_t with another library built using 64-bit off_t, and the resulting behaviour can be incredibly unpredictable.

devit · 2024-09-28T20:24:09 1727555049

The C library uses macros so that the referenced symbols are different depending on ABI (e.g. open becomes open64, etc.), but most 3rd party libraries don't bother with that, so they break if their API uses time_t/off_t.

raldi · 2024-09-28T18:15:35 1727547335

I think the main reason they’re introduced is to provide a hint to the programmer and maybe some typesafety against accidentally passing, say, a file descriptor.

account42 · 2024-10-02T14:18:08 1727878688

They are introduced to abstract ABI differences into a common API. They change when you move to a different system but chaning them on the same system was never the intent.

n_plus_1_acc · 2024-10-02T19:54:01 1727898841

That makes sense, thank you.

progbits · 2024-09-28T17:54:11 1727546051

Edit: nevermind others raise better points.

cataphract · 2024-09-28T18:17:02 1727547422

That's not the problem. If time_t was a struct with a single int32_t, you'd be in the same situation when you changed it to int64_t (ABI incompatibility: you need more space to store the value now).

In order not to have this problem you'd need the API to use opaque struts, for which only pointers would be exchanged.

beached_whale · 2024-09-28T17:52:37 1727545957

at the source level it does, but when you have compiled libraries it breaks.

SkiFire13 · 2024-09-28T17:55:08 1727546108

Yeah, the problem in general with type aliases is that they're just that, aliases, not proper types. This means that they leak all the details of the underlying type, and don't work for proper encapsulation, which is what's needed for being able to change what should be implementation details.

tsimionescu · 2024-09-28T20:15:32 1727554532

The problem here would be exactly the same regardless of what form this type took. The issue is fundamental to any low level language: the size of a type is a part of its public API (well, ABI), by definition. This is true in C++ with private fields as well, for example: if you add a private field to a class, or change the type of a private field with one of a different size, all code using that class has to be re-built. The only way to abstract this is to use types strictly through pointers, the way Java does.

mhandley · 2024-09-28T22:58:38 1727564318

"Let’s consider a trivial example:

  struct {
      int a;
      time_t b;
      int c;
  };

With 32-bit time_t, the offset of c is 8. With the 64-bit type, it’s 12."

Surely it's 16 not 12, as b needs to be 64-bit aligned, so padding is added between a and b? Which also kind of makes the point the author is trying to make stronger.

AshamedCaptain · 2024-09-29T00:55:42 1727571342

Most x86 ABIs do not mandate padding for 64-bit types, due to the lack of 64-bit loads (at the time).

EdSchouten · 2024-09-29T09:05:29 1727600729

Exactly. This is also highly problematic if you try to perform atomic operations against bare 64 bit integers on those systems, because the atomic instructions do require them to be 8 byte aligned. In C11 and later it’s not an issue, because _Atomic(int64_t)’s alignment may be stricter than int64_t.

This was also an issue in Go, because it tries to use the same alignment rules as C to make cgo work. There they also solved it by adding dedicated atomic types.

https://github.com/golang/go/discussions/47141

256_ · 2024-09-29T07:55:54 1727596554

I thought you were right, but I tested it on x86_64 GNU/Linux with this struct:

    struct ts {
            int32_t dword1;
            int64_t qword1;
            int32_t dword2;
    };

Writing -1 to all 3 members and printing out the individual bytes up to sizeof(struct ts), it produces this with -O0 and -O3:

FF FF FF FF 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00

With -Os, I get some garbage data after the last member (always those 2 byte positions, with different values every time):

FF FF FF FF 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF 45 76 00 00

I feel like I'm missing something obvious here. Or maybe my system's ABI is a weird exception.

AshamedCaptain · 2024-09-29T12:36:32 1727613392

In addition to the confusion about x86 and x86_64/amd64, you should also know that -O[s123] cannot possibly influence the padding of structures, or it would mean you could not link a library compiled with -Os with a program compiled with -O3 (or viceversa).

account42 · 2024-10-07T08:35:48 1728290148

In theory the compiler could change the padding of structures if it can prove that this never leaks out of the translation unit. Of course, printing the size or bits of the structure would inhibit that.

jabl · 2024-09-29T08:07:19 1727597239

Was this with x86_64? The point of the parent was about x86_32, I believe.

256_ · 2024-09-29T08:16:53 1727597813

Oh yes, I really was missing something obvious. And you did, too, since I mentioned the architecture in the first line of my post.

jabl · 2024-09-29T08:23:57 1727598237

I did see it, but was unsure if you meant you were compiling for a 32-bit target on a 64-bit host.

Netch · 2024-10-07T14:37:38 1728311858

All this suggests the insane Windows time (64 bits of 100ns periods since 01.01.1601 00:00 GMT as it would have been Gregorian) sometimes has it small advantages - both an excellent discretion and will work even the whole galaxy will be conquerred... ;))

somat · 2024-09-29T16:06:35 1727625995

Just pull the bandaid off, it will hurt a little, but not as bad as this endless agonizing about the problem.

For what it's worth, on openbsd this problem is fixed, all architectures, even the 32bit ones, have a 64bit time_t.

http://www.openbsd.org/55.html

wallstprog · 2024-09-30T15:25:36 1727709936

Briefly mentioned elsewhere in the comments, but C++11 had a similar issue around the transition from a copy-on-write (COW) to a small-string-optimization (SSO) implementation for std::string. If any type is more ubiquitous than std::string, I don't know what it could be, but the transition was reasonably painless, at least in my shop.

See https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_a... for more info.

BoingBoomTschak · 2024-09-28T17:54:19 1727546059

A very thoughtful way of handling a problem much trickier than the earlier /usr merge. Had thought about 2 but not 1 and 3. I also had kinda forgotten that 2038 thing, time sure is flying!

I must say, mgorny's posts are always a treat for those who like to peer under the hood! (The fact that Gentoo has remained my happy place for years doesn't influence this position, though there's probably some correlation)

wpollock · 2024-09-28T19:39:09 1727552349

I'm probably naive, but I see another way forward. If all the t64 binaries are static linked, they have no danger of mixing abis. After 2038, all the t32 code is broken anyway so there's no additional risk for going back to dynamic linking then. I feel if this was a solution the author would have mentioned it but I'm willing to look foolish to hear what others will say.

layer8 · 2024-09-28T19:50:48 1727553048

Static linking is probably impractical for applying security updates, as often you’d have to recompile/relink basically the whole system. In some cases it could also significantly increase memory usage.

GeorgeTirebiter · 2024-09-28T20:41:15 1727556075

with fast networks, huge disks, fast processors --- it seems wasteful to me to even consider shared libraries. Shared libraries is a technology that was useful when we were memory starved. We are no longer memory starved. So you replace the static binary? Big deal, size is not an issue (for 99% of the cases) given what we have today.

Recall, too, that the "link" step of those .o files is actually a "link / edit" step, where routines in the libraries not used are not linked.

layer8 · 2024-09-28T20:49:40 1727556580

It’s much more straightforward to ensure consistency with shared libraries, and not having to rebuild stuff. Wasting disk space, RAM, network bandwidth and processing time is what seems wasteful to me.

jiggawatts · 2024-09-28T22:10:41 1727561441

Static linking is the root cause for “modern” apps based on Electron taking minutes to start up and be useful. They’re statically linking almost an entire operating system (Chromium), an entire web server, server runtime framework, and an equivalent client framework for good measure.

On the fastest PC that money can buy this is somewhere between “slow” and “molasses”.

I miss the good old days when useful GUI programs were mere kilobytes in size and launched instantly.

cbmuser · 2024-09-29T18:42:20 1727635340

> with fast networks, huge disks, fast processors --- it seems wasteful to me to even consider shared libraries.

The problem is that you will always have to rebuild all packages that link against a certain library when a security update is released so that if the user installs a security update, they will have to download several gigabytes instead of just a few megabytes.

account42 · 2024-10-07T08:43:23 1728290603

Have you looked at the size of your average GUI toolkit lately. Dynamic linking is still very much needed.

Besides I like my faster processor and larger disks/ram actually making things faster for me and letting me do more instead of being wasted on developer lazyness.

messe · 2024-09-29T12:52:01 1727614321

Remember that this is for 32-bit systems. Size can be an issue there, especially on an embedded system.

troad · 2024-09-29T01:58:04 1727575084

Considering the most important remaining use case for 32-bit is embedded systems, wouldn't static linking be a non-starter due to space and performance constraints?

loeg · 2024-09-28T18:11:29 1727547089

The C standard does not require time_t to be signed (nor does POSIX). Just changing the 32-bit type to unsigned would (in some senses) extend the lifetime of the type out to 2106. You could at least avoid some classes of ABI breakage in this way. (On the other hand, Glibc has explicitly documented that its time_t is always signed. So they do not have this option.)

pwg · 2024-09-28T18:26:56 1727548016

While that avoids 2038 as a "drop dead" date for 32-bit time_t, it also removes the ability to represent any date/time prior to 00:00:00 UTC on 1 January 1970 using 32-bit time_t because you lose the ability to represent negative values.

Having all existing stored date/times that are currently prior to the epoch suddenly become dates post 2038 is also not a good scenario.

loeg · 2024-09-28T19:18:59 1727551139

> it also removes the ability to represent any date/time prior to 00:00:00 UTC on 1 January 1970 using 32-bit time_t

Yes, of course. This is probably not the main use of negative values with signed time_t, though -- which is just representing the result of subtraction when the operand happened before the subtrahend.

> Having all existing stored date/times that are currently prior to the epoch suddenly become dates post 2038 is also not a good scenario.

In practice, there are ~zero of these on systems with 32-bit time_t and a challenging migration path as we approach 2038.

cryptonector · 2024-09-28T19:48:22 1727552902

> Yes, of course. This is probably not the main use of negative values with signed time_t, though -- which is just representing the result of subtraction when the operand happened before the subtrahend.

This is definitely a bigger concern, yes. One has to be very careful with subtraction of timestamps. But to be fair one already had to be very careful before because POSIX doesn't say what the size or signedness of `time_t` is to begin with.

Indeed, in POSIX `time_t` can even be `float` or `double`[0]!

  time_t and clock_t shall be integer or real-floating types.

Though on all Unix, BSD, Linux, and any Unix-like systems thankfully `time_t` is always integral. It's really only size and signedness that one has to be careful with.

Thus one should always subtract only the smaller value from the larger, and cast the result to a signed integer. And one has to be careful with overflow. Fortunately `difftime()` exists in POSIX. And there's a reason that `difftime()` returns a `double`: to avoid having the caller have to deal with overflows.

Basically working safely with `time_t` arithmetic is a real PITA.

  [0] https://pubs.opengroup.org/onlinepubs/009696799/basedefs/sys/types.h.html

loeg · 2024-09-28T19:58:28 1727553508

> Indeed, in POSIX `time_t` can even be `float` or `double`[0]!

Standard C, yes. Newer POSIX (your link is to the 2004 version) requires time_t be an integer type: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sy...

> Though on all Unix, BSD, Linux, and any Unix-like systems thankfully `time_t` is always integral. It's really only size and signedness that one has to be careful with. Thus one should always subtract only the smaller value from the larger, and cast the result to a signed integer. And one has to be careful with overflow. Fortunately `difftime()` exists in POSIX. And there's a reason that `difftime()` returns a `double`: to avoid having the caller have to deal with overflows.

> Basically working safely with `time_t` arithmetic is a real PITA.

Yes.

cryptonector · 2024-09-28T20:06:05 1727553965

Yes, I know that was older POSIX. But we're talking about old code, the unstated context is portability over a long period of time, and I wanted to make a point :)

So use `difftime()`, don't assume signedness or size, but do assume that it's an integral type.

cryptonector · 2024-09-28T19:34:51 1727552091

Stored _where_ exactly?

umanwizard · 2024-09-28T18:17:48 1727547468

You would then lose the ability to represent times before Jan. 1st, 1970. Which is not just a theoretical concern; those times appear e.g. in databases with people's date of birth.

nobluster · 2024-09-28T18:26:12 1727547972

And a signed 32 bit time_t with an epoch of 1970 cannot represent dates before 1902. Using time_t to store legacy dates is not advisable - even if you ignore all the issues with time zones and changing local laws pertaining to offsets from UTC and daylight saving time.

pwg · 2024-09-28T18:30:44 1727548244

While true, that limitation has always existed, so everyone has already implemented whatever was necessary to represent dates earlier than that.

Changing 32-bit time_t to unsigned suddenly makes all dates from 1902 to Jan 1 1970 which were stored using time_t (even if it was non-advisable, it still will have occurred) appear to teleport into the future beyond 2038.

cryptonector · 2024-09-28T19:34:20 1727552060

That's alright, see, because I have no filesystems, no tarballs, no backups, no files on any kind of media with timestamps before 1970, and indeed, no one can except by manually setting those timestamps -- and why bother doing that?!

So any pre-1970 32-bit signed timestamps will be in... not spreadsheets, in what? In databases? No, not either. So in some sort of documents, so software consuming those will need fixing, but we're not going to see much teleporting of 1902-1970 timestamps to 2038-2106. I'm not concerned.

jasomill · 2024-09-28T20:02:47 1727553767

Many important software projects predate UNIX. Perhaps you want to create a Git repository for one of them, with historically accurate commit timestamps?

cryptonector · 2024-09-28T20:58:14 1727557094

Well, Git doesn't seem to set file timestamps when cloning. And as the sibling comment says, Git doesn't use 32-bit signed integers to store timestamps. So this is not a problem.

If, however, Git were to some day get an option to set file mtimes and atimes at clone time to the last-modified time of the files based on commit history, then you could always just use a 64-bit system where `time_t` is 64-bit.

account42 · 2024-10-07T08:59:30 1728291570

> Well, Git doesn't seem to set file timestamps when cloning.

This is intentional btw because otherwise with many build system things would not get rebuilt correctly when checking out an older snapshot.

cesarb · 2024-09-29T01:29:56 1727573396

More importantly than a git repository, you might have archives (a .tar or equivalent) of these software projects, and might want to unpack them while preserving the timestamps contained within these archives.

cryptonector · 2024-09-29T05:10:42 1727586642

Unix did not exist before 1970. Neither did tar. There are no tar files with files with timestamps before 1970 _unless_ you've used `touch -t 195001010000` or `utime(2)` or `futimes(3)` to set the files' time to before 1970. That seems pretty unlikely.

account42 · 2024-10-07T09:02:26 1728291746

If I create a new archive format today and use it to archive the contents of my disk or whatever USB stick I find do you not thing it might end up containing files with timestamps before today?

loeg · 2024-09-28T20:06:10 1727553970

Git represents timestamps in ASCII strings, like "12345." Not time_t. (I am not sure if it even allows negative integers in this format.)

jasomill · 2024-09-29T18:34:10 1727634850

I was just using Git as an example of an application where pre-1970 timestamps could be useful. But as it turns out,

  $ git commit -m foo --date='1969-12-31 00:00:00'
  [main 25b6e63] foo
   Date: Sun Dec 30 23:00:00 2012 -0500
   1 file changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 bar

bloak · 2024-09-28T19:12:42 1727550762

Serious question: do people use time_t for representing a date of birth?

To me that wouldn't seem right: a date of birth isn't a timestamp and you typically receive it without a corresponding place or time zone so there's no reasonable way to convert it into a timestamp.

(The other problem is that a signed 32-bit time_t only goes back to 1901. You might not have to deal with a date of birth before 1901 today, unless you're doing genealogy, of course, but until fairly recently it's something you'd probably want to be able to handle.)

umanwizard · 2024-09-28T19:41:45 1727552505

> Serious question: do people use time_t for representing a date of birth?

I have seen people in real life use seconds since the UNIX epoch to represent DOB, yes.

nomel · 2024-09-28T19:18:53 1727551133

My birth certificate has location, day, year, hour, and minute of birth. Birth is an event in time, perfectly represented with a (UTC) timestamp.

Merad · 2024-09-28T19:37:16 1727552236

Your birth is an event that happened at an instant in time, but very few systems concern themselves with that detail. The vast majority need to store birth _date_ and have no interest in the time or location.

nomel · 2024-09-28T22:48:51 1727563731

> have no interest in the time

Setting the bottom couple of bytes to zero achieves this, while maintaining nearly universal consistency with all other events in time that might need to be related with a packing of bits. People do it because it's what's being used at nearly every other level of the stack.

cryptonector · 2024-09-28T19:50:14 1727553014

> Birth is an event in time, perfectly represented with a (UTC) timestamp.

That's not the same thing as `time_t` though. `time_t` is UTC. UTC is not `time_t`.

loeg · 2024-09-28T19:19:44 1727551184

Databases do not and can not use system time_t. Consider how their on-disk state would be impacted by a change from 32-bit time_t to 64-bit! Instead they use specific or variable size integer types.

account42 · 2024-10-07T09:10:46 1728292246

Broad statements like this are universally untrue.

pezezin · 2024-09-28T22:28:50 1727562530

Does any database actually use time_t? PostgreSQL uses its own datatype, the number of microseconds since 4713 BC. I am sure that other databases do the same.

https://www.postgresql.org/docs/current/datatype-datetime.ht...

account42 · 2024-10-07T09:11:58 1728292318

How the timestamps are stored on disk is irrelevant if the code data from those databases ends up using time_t (e.g. because it needs to compare stored timestamps with the current time).

cryptonector · 2024-09-28T19:49:06 1727552946

Those databases might not even use `time_t`. It's their problem anyways, not the OS's.

petee · 2024-09-28T18:50:23 1727549423

Openbsd did just this for 32bit compat when they changed to 64bit time 12 years ago, and seems to have worked out fine.

brynet · 2024-09-28T20:32:02 1727555522

No, time_t is a signed 64-bit type on all architectures, 64-bit architectures have no "32bit compat".

https://www.openbsd.org/55.html

petee · 2024-09-28T22:12:54 1727561574

From the notes, this is what i was referring to, I think I just mistook the meaning -

Parts of the system that could not use 64-bit time_t were converted to use unsigned 32-bit instead, so they are good till the year 2106

account42 · 2024-10-07T08:50:13 1728291013

Ah yes, let's introduce subtle bugs or code that calculates the difference between two time_t's where the result can be negative.

If something is effectively always signed you can assume that things depend on it being signed even if the standard doesn't guarantee it. If you are building a Linux distribution you can't just handwave such code away as "technically it was already broken on this theoretical POSIX compliant platform that no one uses".

scheme271 · 2024-09-28T18:23:29 1727547809

If time_t is unsigned, how do times before the unix epoch get represented?

loeg · 2024-09-28T19:20:44 1727551244

They don't, much like they already do not. 32-bit time_t has always been finite, and 1970 was a long, long time ago. (See "in some senses" in my earlier comment.)

account42 · 2024-10-07T09:14:53 1728292493

1970 is not all that long ago when there are stil plenty of people born before that alive today. People with the attitude you are displaying should be kept far far away from time handling code.

ykonstant · 2024-09-29T15:18:42 1727623122

>1970 was a long long time

I thought it was a time_t! (≧◡≦)

ndesaulniers · 2024-09-28T19:55:38 1727553338

I think we should start putting details of the ABI in ELF (not the compiler flags as a string). Wait.. Who owns the ELF spec??

Then the linker and loader could error if two incompatible objects with different ABIs were attempted to be linked together.

For instance, I suspect you could have fields to denote the size of certain types. I guess DWARF has that... But DWARF is optional and sucks to parse.

SubjectToChange · 2024-09-30T15:29:11 1727710151

I think we should start putting details of the ABI in ELF

That basically implies name mangling in some form or another, i.e. good luck trying to convince C programmers of such a thing.

ndesaulniers · 2024-09-30T22:26:45 1727735205

I didn't find name managing to be the problem here. C symbols are mangled in mach-o.

dark-star · 2024-09-28T18:45:17 1727549117

Are there other distros where the switch to 64-bit time_t has already happened? What's the easiest way to figure out whether $distro uses 32-bit or 64-bit time_t? Is there something easier/quicker than writing a program to print `sizeof(struct stat)` and check if that's 88, 96 or 1098 bytes (as hinted in the article)?

SAI_Peregrinus · 2024-09-28T19:35:12 1727552112

NixOS switched. And they're source-based, but with actual dependency tracking and all the (insanely complex) machinery needed to allow different programs to use different C library versions simultaneously.

`printf("sizeof (time_t) = %zu, %zu bits", sizeof (time_t), sizeof (time_t) * CHAR_BIT);` gives you the size in bytes and in bits. Needs time.h, stddef.h, and stdio.h.

jonathrg · 2024-09-28T19:17:42 1727551062

printf '#include <stdio.h>\n#include <time.h>\nint main() { printf("time_t is %%zu-bit\\n", sizeof(time_t)*8); }\n' | gcc -x c -o timesize - && ./timesize && rm ./timesize

If you're wondering about a specific distro that you're not using right now - just look it up.

teddyh · 2024-09-28T19:03:01 1727550181

Like the article says, Debian has already switched.

fred_is_fred · 2024-09-28T18:03:57 1727546637

Besides epoch time and the LFS support mentioned, are there any other 32-bit bombs waiting for Linux systems like this?

tredre3 · 2024-09-28T19:13:15 1727550795

The ext file system uses many 32bit counters. Admittedly, version 4 fixed most of that (when formatted with the correct options).

kbolino · 2024-09-28T19:38:24 1727552304

In a similar vein, inodes can run out. On most conventional Linux file systems, inode numbers are 32 bits.

For many, this is not going to be a practical problem yet, as real volumes will run out of usable space before exhausting 2^32 inodes. However, it is theoretically possible with a volume as small as ~18 TiB (using 16 TiB for 2^32 4096-byte or smaller files, 1-2 TiB for 2^32 256- or 512-byte inodes, plus file system overheads).

Anticipating this problem, most newer file systems use 64-bit inode numbers, and some older ones have been retrofitted (e.g. inode64 option in XFS). I don't think ext4 is one of them, though.

lclarkmichalek · 2024-09-28T20:21:17 1727554877

It does happen in prod. Usually due to virtual FSes that rely on get_next_ino: https://lkml.org/lkml/2020/7/13/1078

Dylan16807 · 2024-09-29T06:06:26 1727589986

That method is wrapping and not checking for collisions? I would not call that a problem of running out then. It's a cheap but dumb generator that needs extra bits to not break itself.

lclarkmichalek · 2024-09-30T01:02:00 1727658120

There is a limit on reliable usage of the FS. Call it what you want. The user doesn't particularly care.

Dylan16807 · 2024-09-30T07:54:06 1727682846

What I'm trying to say is that the problem you're describing is largely a separate problem from what kbolino is describing. They are both real but not the same thing.

account42 · 2024-10-07T09:20:18 1728292818

... which also caused it's own issues with 32-bit applications without large file support that now fail to stat those files on 64-bit inode filesystems.

fph · 2024-09-28T18:30:15 1727548215

IPv4, technically, is another 32-bit bomb, but that's not Linux-specific.

samatman · 2024-09-28T19:16:16 1727550976

I wouldn't call this a bomb at all. Bombs are events, not processes.

Resource contention for IPv4 has been around for a long time, with a number of workarounds and the ultimate out of supporting IPv6. There has been, to date, no moment of crisis, nor do I expect one in the future.

It will just get steadily more annoying/expensive to use IPv4 and IPv6 will relieve that pressure incrementally. We're at least two decades into that process already.

poincaredisk · 2024-09-28T19:49:58 1727552998

Two decades in, and IPv6 is still the more annoying option.

I wish they were less ambitious and just increased address sizes when designing ipv6.

icedchai · 2024-09-29T02:31:17 1727577077

If more people bothered to configure IPv6 instead of complaining about it, V4 would be a thing of the past already.

o11c · 2024-09-29T03:20:59 1727580059

A lot of people don't have a choice.

Just last(?) year I finally switched to an ISP whose equipment supports IPv6. But I still can't actually use it since my wifi router's support for IPv6 somehow fails to talk to the ISP equipment.

Hm, there's a firmware upgrade ... (installs it) well, re-trying all those options again, it looks like one of them lets it finally work now. The others (including the default) still fail with arcane errors though!

account42 · 2024-10-07T09:26:10 1728293170

Interesting that the IPv6 adoption still differs so wildly for consumer connections aroundthe world. ISPs here have started to not give you a real public IPv4 address unless you ask for it. I expect it to become a paid option soon enough (main reason normal people ask for it is to connect to shitty IPv4-only corporate VPNs that don't like the ISPs NAT).

icedchai · 2024-09-29T13:45:18 1727617518

I mean, you are right. ISPs also need to stop complaining about supporting it and get off their butts. All enterprise routing equipment for the past 15+ years has supported IPv6. There is no excuse not to support it. Once you understand it, it is actually simpler than IPv4 + NAT.

panzi · 2024-09-28T21:54:34 1727560474

The only place where this is relevant for me is running old Windows games via wine. Wonder how wine is handling this? Might as well re-map the date for 32bit wine to the late 90s/early 2000s, where my games are from. Heck, with faketime I can do that already, but don't need it yet.

account42 · 2024-10-07T09:47:19 1728294439

Win32 doesn't use time_t internally and has had 64-bit FILETIME since Windows 95 (counting 100-nanoseconds since 1601) as well as 64-bit performance counters (ticks) which are often used by games. Similarly, SYSTEMTIME has a 16-bit unsigned year field so should be good for a while.

There is also an older GetTickCount() with 32-bit times but it is less problematic since a) it counts (milliseconds) since system startup and b) it is typically used to determine the time between two points (e.g. one frame to the next) so wraparound is not as much of an issue.

time_t is provided in the C-library but many Windows programs don't end up using it. The CRT also has had __time64_t for a long time now for this purpose.

16-bit Windows/DOS programs often use the FAT filetime format which can store dates until 2099.

TL;DR Windows/Wine internally does not need to change anything for a long time. Of course inevitably some applications will use 32-bit time_t or other 32-bit integers to store timestamps internally and will require workarounds.

ddoolin · 2024-09-28T21:01:08 1727557268

Why was it 32 bits to begin with? Wasn't it known that 2038 would be the cutoff?

wongarsu · 2024-09-28T21:15:37 1727558137

It has been that way since Unix V4 which released in 1973. Back then moving to something that breaks after 65 years was a massive upgrade over the old format that wrapped after two and a half years. And at least from the standpoint of Unix engineers 32 bits was more than enough: Nobody is using Unix V4 in production in 2024, never mind 2038.

Why it made it into Posix and wasn't updated is a different question that's a bit more difficult to answer

int_19h · 2024-09-28T21:11:30 1727557890

It was not exactly a big concern when this was designed in early 70s, especially when you consider that Unix itself was kind of a hack at the time.

modeless · 2024-09-28T21:16:35 1727558195

In the 1970s people probably would have laughed at the idea that Unix would still be running in the year 2038.

ddoolin · 2024-09-29T00:16:56 1727569016

I get that; I am surprised to find my own software running 7 years later, let alone ~60 and given the context.

PHGamer · 2024-09-29T05:09:00 1727586540

well sometimes the old stuff is better than the new stuff. it really is hilarious though. things in the software world have regressed in some ways especially when it comes to privacy and locking down data.

NegativeLatency · 2024-09-28T21:05:34 1727557534

Ultimately probably hardware? I suspect it’s also been like this for a long time.