The only real solution (at least until and unless the kernel OOM killer is tuned...

AnIdiotOnTheNet · on April 21, 2022

I disagree. The real solution is to do away with the need for OoM killer in the first place by turning off overcommit (in its current form anyway) and fixing the broken programs that rely on its behavior.

mort96 · on April 21, 2022

In practice, programs crash when they receive a null pointer from malloc (either through throwing an uncaught exception, or through `if (!ptr) { abort(); }`, or through dereferencing a null pointer). So even if your solution was realistic, it would just entail killing a random process, and it would prioritize killing an essentially random process. When you reach OOM situations and need to kill processes, there are probably better heuristics than "kill whichever process happened to allocate memory after we ran out".

AnIdiotOnTheNet · on April 21, 2022

So we're covering up bad programmer behavior by lying to them and then shooting other programs in the head when everything goes south. By default. Sorry if I feel like we should be able to do better than that.

Windows, for instance, doesn't have this kind of overcommit. Allocations need to be backed by RAM+pagefile (though Windows will grow the page file if it can and needs to).

EmpirePhoenix · on April 21, 2022

Things go very south in windows if you ever use really much swap. Had this with a CAM process using a lot of swap.

You barely move the barrier until the backing pagefile cannot grow anymore (which with a fast but small nvm can be reached within a few seconds). After that you get stuff like a taskmanager without fonts, as there is no memory for loading it anymore...

AnIdiotOnTheNet · on April 21, 2022

Yeah, so, worst case it is exactly like Linux. However, at least with Windows properly behaving software isn't being lied to about its allocations.

mort96 · on April 21, 2022

My argument was that _even without overcommit or swap_, we would probably want some kind of OOM killer, because the heuristic of "kill the process which happens to try to allocate first when we have run out" is probably one of the worse heuristics for which processes should be killed.

AnIdiotOnTheNet · on April 21, 2022

Except that's up to the application to behave that way instead of some mysterious heuristic. If the programmer decides that terminating is the appropriate behavior, or is too lazy to do otherwise, then that is on the program. If programs have broken behavior then fix them, otherwise what is the point of all this open source software anyway? I find it incredible that Linux developers are expected to routinely deal with API breakages on library updates but would rather have random processes be terminated because the OS lies about memory than fix the badly behaving software!

shawnz · on April 21, 2022

And instead force every application which needs to store large amounts of temporary data to implement its own swapping mechanism? Don't you think we will end up with a lot of even less optimized swapping systems that way?

silon42 · on April 21, 2022

If you need swap, you need swap.... You shouldn't borrow it from other programs executable pages (without proper accounting).

silon42 · on April 21, 2022

IMO, a good start would be to fix the OOMKiller to kill all processes that overcommit first ordered by size of overcommit (maybe match the uid first).