That's all irrelevant for a language which is supposed to be a systems programmi...

vertex-four · on Aug 20, 2016

I don't think it is - less, bash, gcc, systemd, most interpreters, and Firefox will all pretty much just crash if malloc() fails - they're varying levels of things you'd want a "systems language" for. Unless you're trying to define systems language to mean one used for kernel or bare-metal embedded development (both cases where even in C, you skip the standard library and write your own memory allocation functions etc), there's really very few cases where you (a) are allowed to allocate memory and (b) need to fail gracefully somehow if you can't.

wahern · on Aug 20, 2016

I'm not saying that aborting on OOM never makes sense. I'm saying there are many situations where you shouldn't abort on OOM, and those situations overlap considerably with the situations in which a systems language is ideal[1]. So a systems language that doesn't allow for robust and fine-grained OOM handing isn't much of a systems language.

Example: Basically any highly concurrent network daemon that multiplexes many clients on the same thread. In that case, you want much more control over where to put your recovery point. Even if the process doesn't abort, if the recovery point is beyond the scope of the kernel thread (i.e. a controller thread, which was the recommended solution before catch_unwind), that can be really inconvenient, and also requires a lot of unnecessary passing of mutable state between threads, which is usually something you try to avoid.

[1] Lua has robust support for OOM recovery, which is noteworthy because there can be situations where you both want to handle OOM but where a scripting language is more preferable. Example: An image manipulation program with scriptable filters, where you don't want an operation that can't complete to take down your process or thread.

vertex-four · on Aug 20, 2016

On the other hand... who actually disables over-committing on their Linux servers or the machines they run image editors on so that you could actually handle out-of-memory errors? I've never come across anyone who's changed the setting unless told to by a specific piece of software (almost always database software).

wahern · on Aug 20, 2016

I do. Not only do I disable overcommit, I disable swap.

Most software is riddled with buffer overflows and other exploits, and yet it's rare that you come across an intruder while he's installing his rootkit. That doesn't mean it's not happening, just that people are ignorant about it; and that things can appear normal even with rootkits installed.

Like buffer bloat, people can be experiencing a problem without even realizing it's a problem. When software crashes under load they just think that it's _normal_ to crash under load.

Or when it crawls to a snails pace under load because it's swapping like mad, they think that's normal, even though QoS would have been much better if the software failed the requests it couldn't serve rather than slowing everybody down until they _all_ timeout, sometimes even preventing administrators from diagnosing and fixing the problem.

OOM provides back pressure. Back pressure is much more reliable and responsive than, e.g., relying on magic constants for what kind of load you _think_ can be handled.

nnethercote · on Aug 20, 2016

> Firefox will all pretty much just crash if malloc() fails

It's not nearly as simple as that. A general approach we try to use is the following.

- Small allocations are infallible (i.e. abort on failure) because if a small allocation fails you're probably in a bad situation w.r.t. memory.

- Large allocations, especially those controlled by web content, are fallible.

The distinction between small and large isn't clear cut, but anything over 1 MiB or so might be considered large. You certainly don't want to abort if a 100 MiB allocation fails, and allocations of that size aren't unusual in a web browser.

wahern · on Aug 20, 2016

That approach may make sense on the desktop, but it doesn't make sense for network servers. On a network server allocations tend to be small. Your 10,001st request might fail not because of a huge allocation but because it just pushed you over the line.

Aborting all 10,000 requests provides really poor QoS. You can get away with that if you're Google, because 10,000 requests is still miniscule relative to your total load across "the cloud". For almost everybody else it will hit hard, and among other things makes you more susceptible to DoS.

pcwalton · on Aug 20, 2016

So, to be clear, you're implying that you shouldn't write network services in Python, Ruby, PHP, Go, Java in practice [1], Perl, C++ without exceptions, JavaScript, etc. etc.?

[1]: http://stackoverflow.com/questions/1692230/is-it-possible-to...