What, O_CLOEXEC? Well, yes, you really need to use the new syscalls that let you...

rand_flip_bit · on March 23, 2023

There is no such thing that allows you to avoid the race condition. There are a few issues:

- You can't control every call to open, and thus enforce O_CLOEXEC on all files (by default)

- Because of this, most files will be leaked into the child, this is likely not desirable, especially since some distros have low open file limits for processes

- posix_spawn (the replacement for fork/exec) allows you to specify a list of file actions to perform, including opening and closing, this seems like a solution at first glance

- However, there is a TOCTOU race here, as you need to first make a list of file actions with posix_spawn_file_actions and then call posix_spawn. Note that every file you want to close needs to have it's own file action, this means you need to determine all the files that are open and manually add each one. This alone introduces the problem of determining all open files in your process.

- In a multi-threaded program it is possible for another thread to open a file between the calls to posix_spawn_file_actions and before posix_spawn, thus creating the potential for files to leak into the child.

- Even in a single threaded program, it is possible for posix_spawn to to invoke functions established with pthread_atfork, and atfork handlers are allowed to call signal-safe functions, including but not limited to open. Implementations aren't required to call atfork handlers, and modern glibc doesn't, but this is by far no guarantee.

- Therefore, my argument is that posix_spawn cannot be used to create a process with a guaranteed minimal and clean state, and so you are back to square-one with fork/exec.

The defaults for working with these APIs are just completely wrong, and very hard to get correct. The issues with fork/exec are numerous and nuanced, and most people simply aren't aware of the issues or don't care. There is a specific song and dance that needs to be performed when using fork/exec and usually you want to hide all of that behind a library function... which will look something similar to CreateProcess.... sure you might use the builder pattern to make it look nicer, but you really don't want the fork/exec split.

Here are few other issues with fork/exec, non-exhaustive:

- Only signal-safe functions can be invoked between fork and exec. This means you need to be super careful with any stdlib code you invoke between these two (or better yet, just don't).

- Multithreaded programs cannot call fork without exec. period. The state of objects such as mutexes and condition variables will be inconsistent. This is implied by the above, but I wanted to specifically call this out.

- Detecting if exec failed instead of the program requires using an extra pipe marked with CLOEXEC, I have seen too much code using a magic exit code (which is wrong)

- Cleaning up the state of the child process and not accidentally creating a zombie is a bit tricky and there are some race conditions to be aware of. pidfd is not a solution if you need to support older kernels, although helps tremendously.

- Interaction with signals is a bit messy.

- When fork is called, all pages will be marked as copy-on-write, this can be slow for processes with lots of memory allocated, and is completely redundant if your goal is to call exec. If other threads exist and are writing to memory, the pages they touch will be copied unnecessarily.

- Like I harped on earlier, files are inherited by default, not the other way around. You should be required to manually list the fds that you want the child to inherit (likely stdin, stderr and stdout only for 99% of cases).

- Distinguishing exec failure from exceeding but the process failing requires a CLOEXEC pipe

- If exec fails, _exit must be called! you cannot terminate the child in any way that might run destructors, of invoke callbacks/handlers as these can perform I/O and would thus be observable.

CreateProcess is just much better, and the whole "it takes 12 parameters how awful" argument against it is 100% a non-issue. It isn't 1960 anymore, it's okay to have a function with a name longer than 6 letters and more than 3 parameters.

cryptonector · on March 24, 2023

> There is no such thing that allows you to avoid the race condition. There are a few issues:

> - You can't control every call to open, and thus enforce O_CLOEXEC on all files (by default)

Eh, you can if you open-code everything that might not call `pipe2(2)`, or `accept4(2)`, etc. It's not great, but it is possible. You can also LD_PRELOAD a shim to make everything do that -- also not great, but possible.

You can also do the spawn server thing, which solves the problem, though you need to spawn the spawn server early, and if not, well, yeah.

You can also close (on the child side of vfork()) all the FDs you don't know. That mostly works, unless you are running in a context where you need to keep some FDs you don't know about.

We don't have a time machine. We only have these workarounds. It's not all bad.