There as nothing in the Lufthansa plans or policies that would make this flight or landing impossible, unreasonable or unsafe. I imagine this landing at night is not a first either.
This argument does not make sense - the kernel already needs to track per-process file descriptors. It just looks for the first hole instead of giving the "next" value.
Go's random map iteration does not apply here. Not only is this not an iterable map, the kernel has no problem providing this insertion guarantee so adding additional costly randomization has no benefit and just burns additional cycles.
Go would also be better off without, but they are catering to a different audience and different degree of specification, and apparently need to actively deter developers from ignoring documentation.
The correct term for this is not "developers ignoring documentation" it's "ossification" or Hyrum's Law:
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
I guess that we got this "lowest available" rule because that's what the first implementation happened to do (it's the obvious thing to do if you have a single core), then someone 'clever' noticed that they could save 3 cycles by hard coding and reusing the fd in their IO-bound loop, and anyone that tried to implement fd allocation differently was instantly met by "your OS breaks my app", and thus the first implementation was permanently ossified in stone. To be clear I'm not making any historical claims and this is pure speculation.
"Stupid developers should have rtfm humph" is not a useful position because it ignores this behavior ossification.
The Go map example is actually very relevant, it's an "anti-ossification" feature that makes the behavior match the spec. If the spec says iteration order is not guaranteed, but in practice people can rely on it being the same in some specific situation (say, in a unit test on a particular version of Go) then the spec is ignored and it breaks people's programs when the situation changes (e.g. Go version updates). This actually happened. Instead of giving in and ossifying the first implementation's details into the spec, Go chose the only other approach: Make the behavior match the spec: "iteration order is not guaranteed" == "iteration order is explicitly randomized". (They do it pretty efficiently actually.)
As mentioned elsewhere, the file descriptor table is an array and a bitmask - finding the next fd is a matter of finding the first unset bit, which is extremely efficient. And that's before we ignore that the file descriptor table is read-heavy, not write-heavy.
Should you want to have per-process file descriptor tables, you can do just that: Just create a process without CLONE_FILES. You can still maintain other thread-like behaviors if you want. I doubt you'll ever sit with a profile that shows fd allocation as main culprit however.
> If the spec says iteration order is not guaranteed, but in practice people can rely on it being the same in some specific situation ... This actually happened.
If Hyrum's law held, the API would already be "ossified" at this point.
Instead, the Go developers decided to make a statement: "The language spec rather than implementation is authoritative". They broke this misuse permanently by making the API actively hostile, not by making it "match the spec" as it already did.
While one could interpret the current implementation as "anti-ossification", I interpret the action as anti-Hyrum's Law by choosing to break existing users in the name of the contract.
If we ignore POSIX for a moment, the kernel could avoid contending on the one-per-process fd map by sharding the integers into distinct allocation ranges per thread. This would eliminate a source of contention between threads.
In addition to violating POSIX’ lowest hole rule, it would break select(2) (more than it’s already broken).
This sounds like premature optimization. FD availability is tracked in a bitmask, and finding the next available slot is a matter of scanning for the first unset bit under a spinlock. This is going to be extremely fast.
While you could shard the file descriptor tables for CLONE_FILES processes such as threads, you would likely complicate file descriptor table management and harm the much more important read performance (which is currently just a plain array index and pretty hard to beat).
You could also juts create your processes (or threads) without CLONE_FILES so that they get their own file descriptor table.
------
You would not use io_uring for things like that. Not only will you still use regular file operations on device files for various reasons, should you chose to use io_uring you would want it to run your entire eventloop and all you I/O rather than single operations here and there. Otherwise it just adds complexity with no benefit.
I don't see the big issue. There is no other way in Linux or Posix to open a file asynchronously (not sure about closing). Dan Bernstein complained about that 20 years ago(?) and io_uring finally fixes it. Before that, runtimes with lightweight processes/threads (Erlang, GHC) used a Posix threadpool to open files in the background. That seems just as messy as using io_uring, which at least keeps everything in the same thread.
It is important to not conflate POSIX requirements with expected behavior, especially for device files which require very specific knowledge of their implementation to use (DRM ioctl's and resources anyone?).
You might think that as a well-behaved game should not be opening/closing evdev fds during gameplay at all, this is clearly just an application bug. However, games are not the main user of evdev devices, your display server is! This bug causes input device closure during session switching (e.g. VT switching) to take abnormally long - on the machine I discovered the bug on, it ends up adding over a second to the session switch time, significantly impacting responsiveness.
This is absolutely a kernel bug. I did not push the patch further as I had other priorities, and testing this kind of patch is quite time-consuming when it only reproduces in a measurable way on single physical machine. Other machines end up with a much shorter synchronize_rcu wait and often have many fewer input devices, explaining why the issue was not discovered/fixed earlier.
call_rcu is intended to be used wherever you do not want the writer to block, while alternative fixes involve synchronize_rcu_expedited (very fast but expensive), identifying if the long synchronize_rcu wait is itself a bug that could be fixed (might be correct), or possibly refactoring evdev (which is quite a simple device file).
As for putting things in threads, I would consider it a huge hack to move open/close. Threads are not and will never be mandatory to have great responsiveness.
> As for putting things in threads, I would consider it a huge hack to move open/close. Threads are not and will never be mandatory to have great responsiveness.
The POSIX interface was invented for batch processing. Long running non-interactive jobs. This is why it lacks timing requirements. All well-designed interactive GUI applications do not interact with the file system on their main thread. This is especially true for game display loops. The fundamental problem here is that they are doing unbounded work on a thread that has specific timing requirements (usually 16.6ms per loop). As I’ve said elsewhere, this bug will still manifest itself no matter how fast you make close(), just depends on how many device files are present on that particular system. It’s a poor design. Well designed games account for every line of code run in their drawing loop.
> This is absolutely a kernel bug.
I don’t think that is proven unless the original author can chime in. It’s your best guess and opinion that the author intended to not block on synchronize_rcu but it’s perfectly possible they did indeed intend the code as written. synchronize_rcu is used in plenty of other critical system call paths in similar ways, not every one of those uses is a bug. I would guess you might be slightly suffering from tunnel vision a bit here given how the behavior was discovered.
If it is indeed the case the synchronize_rcu is taking up to 50ms I would suspect there is a deeper issue at play on this machine. By search/replacing the call with call_rcu or similar you may just be masking the problem. RCU updates should not be taking that long.
> All well-designed interactive GUI applications do not interact with the file system on their main thread
I strongly disagree. A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.
The POSIX interfaces provide sufficient non-blocking functionality for this to be true, and the (as per the documentation, "brief") blocking allowed by things like open/close is not an issue.
(io_uring is still a nice improvement though.)
> I don’t think that is proven unless the original author can chime in.
This argument is nonsense. Whether or not code is buggy does not depend on whether or not the author comments on the matter. This is especially true for a project as vast as the Linux kernel with its massive number of ever-changing authors.
> If it is indeed the case the synchronize_rcu is taking up to 50ms I would suspect there is a deeper issue at play on this machine. By search/replacing the call with call_rcu or similar you may just be masking the problem. RCU updates should not be taking that long.
synchronize_rcu is designed to block for a significant amount of time, but I did not push the patch further exactly because I would like to dig deeper into the issue rather than making a text-book RCU fix.
> A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.
The "well-designed" argument here is a bit No True Scotsman, and absolutely not true. Consider a lagging NFS mount. Or old hard drives; a disk seek could take milliseconds!
Real time computing isn't about what is normal or average, it's about the worst case. Filesystem IO can block, therefore you must assume it will.
> The "well-designed" argument here is a bit No True Scotsman, and absolutely not true.
This counter arguments can be interpreted as a mere No True Scotsman of "responsiveness", so this is not a very productive line of argument.
Should one be interested in having a discussion like this again, I would suggest strictly establishing what "responsive" means (which is a subjective experience), including defining when a "responsive" application may be "unresponsive" (swapping to disk, no CPU/GPU time, the cat ate the RAM), and evading terms like "well-designed" (I included it in protest of its use in the comment I responded to).
For example, failing to process input or skipping frames in gameplay would be bad, but no one would see a skipped frame in a config menu, and frames cannot even be skipped if there are no frames to be rendered.
> Should one be interested in having a discussion like this again, I would suggest strictly establishing what "responsive" means (which is a subjective experience)
This has been established for years. This is the basis of building real time systems. For example, Flight control systems absolutely must be responsive, no exceptions. What does that mean? That the system is guaranteed to respond to an input within a maximum time limit. POSIX applications may generally give the appearance of being responsive but absolutely are not unless specially configured. There is no upper bound on how long any operation will complete. This will be apparent the minute your entire system starts to choke because of a misbehaving application. Responsive systems have a hard bound on worst case behavior.
> A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.
Hmm. If you call open()/read()/close() on the main thread and it causes a high latency network operation because that user happens to have their home directory on a network file system like NFS or SMB, your application will appear to hang. When you design applications you can’t just assume your users have the same setup as you.
> The POSIX interfaces provide sufficient non-blocking functionality for this to be true
POSIX file system IO is always blocking, even with O_NONBLOCK. You can use something like io_uring to do non blocking file system io but that would no longer be POSIX.
> Whether or not code is buggy does not depend on whether or not the author comments on the matter.
That would depend on if you knew more about how the code is intended to work than the original author of the code. Do you presume to know more about how this code is intended to work than the original author?
> That would depend on if you knew more about how the code is intended to work than the original author of the code. Do you presume to know more about how this code is intended to work than the original author?
I am not sure if you are suggesting that only the author can know how code is supposed to work, that finding bugs require understanding of the code strictly superior to the author, or that the author is infallible and intended every behavior of the current operation.
Either way, this attitude would not have made for a healthy open source contribution environment.
> that finding bugs require understanding of the code strictly superior to the author,
Evaluating whether or not something is a bug in a specific part of a system absolutely requires understanding the intent of the code equal to the author. You have found undesirable application-level behavior and have attributed the cause to a specific line of code in the kernel but it’s possible you are missing the bigger picture of how everything is intended to work. Just because latency has been tracked down to that line of code does not mean the root source of that latency is that line of code. Symptoms vs root causes.