It seems odd that the problem is in the *access* time of the files - why does a ...

cesarb · on Nov 19, 2022

> why does a font library (or, almost any program) care about the last read time of a file? Sure, the modification time is important, but it's pretty rare that code should care about when a file has been read before.

There's no separate system call for the modification time; a single system call (https://man7.org/linux/man-pages/man2/stat.2.html) returns the three times (atime, mtime, ctime) together. The font library probably wanted just the modification time (to check whether the font cache is stale), but it cannot get the mtime without also getting the atime (and ctime).

joosters · on Nov 19, 2022

Sure, but EOVERFLOW isn't coming from stat() in this case, is it? The man page states that this is returned for problems with file sizes too big for 32 bits, not for struct timespecs. Something else must be doing things with the atime, I would guess?

quietbritishjim · on Nov 19, 2022

I think stat() in the 32bit version of libc is making the 64 bit system call to get the value from the kernel and noticing that it would overflow. The man page for stat() [1] says that EOVERFLOW is a possible error value so that lines up.

[1] https://man7.org/linux/man-pages/man2/lstat.2.html

cesarb · on Nov 19, 2022

> I think stat() in the 32bit version of libc is making the 64 bit system call to get the value from the kernel and noticing that it would overflow.

I'm not certain I'm looking at the correct file, but that does seem to be the case, at least on latest glibc: stat redirects to fstatat(AT_FDCWD) (https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...), and fstatat calls a 64-bit system call and fails with EOVERFLOW if any of st_ino, st_size, st_blocks, st_atim, st_mtim, st_ctim wouldn't fit in the 32-bit struct (https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...).

Yes, st_ino too, which means it could also break on a filesystem with large enough inode numbers. Using a 32-bit userspace nowadays seems more problematic the more I look.

dezgeg · on Nov 20, 2022

> Yes, st_ino too, which means it could also break on a filesystem with large enough inode numbers.

Yep, been broken for 8 years in some Valve's games: https://github.com/ValveSoftware/Source-1-Games/issues/1685

cesarb · on Nov 20, 2022

Many thanks, it was exactly that bug report I was thinking of, but couldn't recall where I had seen it.

txtsd · on Nov 21, 2022

Thanks for this. It sent me down quite the rabbithole.

somat · on Nov 19, 2022

The nice thing about openbsd having no abi guarantees is that they can fix this problem the correct way. they made time_t 64 bit on all architectures.

The downside to having no abi guarantee is that you will not have old binaries to run in the first place, hope you remembered the source. sigh

All things considered, if you have the source to everything, abi is overrated, if you don't, it is vital.

extra thoughts: obenbsd is cool because they don't have or need the *64 file access functions. (fopen64, fseek64...) however this sucks when porting... because they don't have the *64 functions.

throwawaylinux · on Nov 20, 2022

> The nice thing about openbsd having no abi guarantees is that they can fix this problem the correct way. they made time_t 64 bit on all architectures.

The correct way is to create APIs that take a 64-bit time_t and migrate applications over to them. No ABI guarantee means the old APIs can be removed if they are a burden to implement, but obviously for the case of time_t they aren't, so sticking a warning message in there is sufficient for the next 15 years or so.

> All things considered, if you have the source to everything, abi is overrated, if you don't, it is vital.

ABI might be, but API isn't. Even within a single application, the correct way to do internal interface changes that affect a lot of code is generally to create the new one, move callers, then remove the old one. Certainly in a case like this where keeping the old APIs around is trivial.

And OpenBSD does not have the source code to everything, and even in ports, there tends to be an upstream and issues with porting.

somat · on Nov 20, 2022

changing

typedef time_t int32_t

to

typedef time_t int64_t

does not change your api

openbsd tends to be respectful to the api

however the binary interface changes every couple of weeks. and they have a flag day(breaking incompatible change) every year or so.

As such, actions that are unthinkable on linux, like an abi flag day. The openbsd project has gotten really good at handling, after all, if you break stuff all the time you get good at picking up the pieces. to misquote Raul Julia "For you, linux, the day your abi changed was the most important day of your life. But for me, it was Tuesday."

This means that the openbsd project is exceptionally unfriendly to binary only programs(commercial software), As much as I like openbsd I would not even try.

throwawaylinux · on Nov 20, 2022

> changing > > typedef time_t int32_t > > to > > typedef time_t int64_t > > does not change your api

Not sure I agree, because time_t itself is part of the API, and programs can use it for more than just calling your syscall, like in their own structures.

Linux has found they don't need these flag days, they're an ugly old sledgehammer that used to be quite common in systems programming, but Linux (and presumably Windows though I haven't seen the source code to make a judgement) really pioneered much more disciplined, thoughtful, and structured way to manage API and ABIs such that new versions can be brought in with little disruption and old versions can also be maintained usually with little burden to the code base. It's a better system all around IMO, even if you did decide to remove the old stuff right afterwards, the change process is just the right way to go. And keeping around the old stuff and not having to change the world or break your users is actually a good thing too, the ability to make changes less painful than these big hammer flag days makes things very flexible and adaptable.

joosters · on Nov 19, 2022

Yes, I read that man page. It states:

  EOVERFLOW: pathname or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the types off_t, ino_t, or blkcnt_t.  This error can occur when, for example, an application compiled on a 32-bit platform without D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.

None of off_t, ino_t or blkcnt_t are to do with times, they are related to file size. The man page has nothing to say about EOVERFLOW and times. Perhaps the man page is out of date, or perhaps it is another syscall that is returning EOVERFLOW?

I'd be surprised if it was actual userspace code in the font library that was generating that errno. After all, if you care enough to spot an overflow in your calculations, you probably care enough to handle that error case better (and know enough about the situation to handle it properly). Something must be making a specific syscall, getting EOVERFLOW, then throwing it back up to the user. But is it really the ubiquitous stat() ?

phshift · on Nov 19, 2022

Note: The linux manpages cover the syscall interface of the linux kernel, not the glibc implementation. You can check the glibc source code yourself, but glibc will set errno for multiple reasons outside of the raw syscall, including for time overflows.

cesarb · on Nov 19, 2022

> Note: The linux manpages cover the syscall interface of the linux kernel, not the glibc implementation.

They cover both, but they focus more on the glibc wrappers. For instance, the manpage for stat(2) we're talking about says "On success, zero is returned. On error, -1 is returned, and errno is set to indicate the error.", which is not the syscall interface return value (the syscall does not know about errno, it returns the negative of what will end up in errno instead of -1). Another example is the manpage for exit(2), about the _exit() function (the exit() function, without the underscore, is at exit(3) since it's not a system call), which says "In glibc up to version 2.3, the _exit() wrapper function invoked the kernel system call of the same name. Since glibc 2.3, the wrapper function invokes exit_group(2), in order to terminate of the threads in a process."

pantalaimon · on Nov 19, 2022

It’s just another instance of glibc being glibc

userbinator · on Nov 19, 2022

In this case, just letting it roll over to the 70s would've probably not created this bug.

rbanffy · on Nov 20, 2022

True, but then you are giving incorrect information about that file.

I tend to prefer failing before returning wrong data. Who knows what the program using the function will decide to do based on that data...

account42 · on Nov 21, 2022

Failing before returning the wrong data is good when someone is around to fix the failures. Not so good if it bricks the application without a fix even though the bad data would not have been important, such as e.g. file access times.

rbanffy · on Nov 21, 2022

When in doubt, fail safely. Always.

There is no way to be sure the date in question is or is not important. In that case, instead of giving someone a lifetime exposure to ionizing radiation, or pausing the respirator because the last time the person took a breath is in the distant future, it's better to crash right away so a person can figure out what to do.

account42 · on Nov 22, 2022

Only a Sith deals in absolutes.

The original post is about a game distribution client. You don't think that the approach for health/safty SW and entertainment SW can have different approaches to error handling?

Also remember that the latter class of software is usually abandoned after a short time so there will be noone around to fix the bug once a user runs into your fail-early check. So yes, pretty please don't fail early in release builds of single-player offline games and other SW with similar characteristics - currupting some state that *might* result in problems is much preferable to intentionally making the game unplayable.

Its also a matter of how you fail and how much. Surely you don't want your computer to BSOD if there is any application error so maybe there are other cases where you want to limit the scope of the error handling too. E.g. zeroing out the atime if it overflows could make sense. Failing the whole stat call likely causes more probblems than it prevents.