What does this mean? I don't really understand the quote.

bri3d · on Feb 14, 2018

BPF started as the Berkeley Packet Filter, a language for declaring network packet filtering rules and eventually, a pretty good JITted virtual machine runtime for applying these rules quickly. However, it's since evolved to a generic filtering VM and been applied to system trace and other kernel-level filtering usecases.

Historically, threads on Linux were implemented as process-alike tasks and even had unique PIDs, which caused all sorts of hell for vanilla-POSIX threaded applications.

And, in Linux, code style differences in device drivers led to (rather than a codestyle enforced by an automatic formatter) "semantic patches" - patches that can apply to code regardless of its formatting.

twic · on Feb 14, 2018

Threads still have PIDs! I have had an issue on some heavily-loaded systems where a process dies but leaves a pidfile around, then its PID is reused for a thread in another process. When i run a restart script for the process, it reads the pidfile, confirms that the PID is alive, and then kills it - so shanking some random completely unrelated process.

I have modified the script to make some more careful checks before killing a PID it finds in a pidfile. Really, we should just use a proper process manager, but that probably won't happen soon.

quotemstr · on Feb 15, 2018

For ages now, I've wanted NT-style process handles. open(2)ing a process (maybe via its proc directory) should keep the corresponding process alive, even if as a zombie. This way, you'd be able to write perfectly robust versions of pkill without annoying TOCTOU PID reuse races.

Linus has rejected this mechanism due to the ability of an outstanding process handle to prevent process reclaim, permitting users to fill the process table --- but I think this argument is bogus: users not constrained by RLIMIT_NPROC can do that already, and we could count process handles against RLIMIT_NPROC.

Bonus points: allow operations like kill, ptrace, status-reading, etc., via the process handle FD instead of PID-accepting syscalls. This way, you'd be able to pass the FD around as a credential.

Even more bonus points: select(2) support for the FD for easy process death monitoring.

nanny · on Feb 15, 2018

The argument is not bogus, but it may not be realistic. But this is: https://randomascii.wordpress.com/2018/02/11/zombie-processe...

quotemstr · on Feb 15, 2018

So? How is that different from leaking any other kind of resource? There's nothing special about processes.

JdeBP · on Feb 15, 2018

We've talked before about FreeBSD's process descriptors and EVFILT_PROC kevents, ne?

loeg · on Feb 15, 2018

Process descriptors are unfortunately a little incomplete -- pdwait() was never implemented. They don't seem to be getting much use.

crest · on Feb 22, 2018

You don't need pdwait(). You can watch the process descriptor with kqueue to get the process exit status look for EVFILT_PROCDESC in https://www.freebsd.org/cgi/man.cgi?kevent.

JdeBP · on Feb 15, 2018

Threads are supposed to have PIDs, the PIDs of the processes that they variously belong to. What they are not supposed to have is different PIDs within a single process.

* http://jdebp.eu./FGA/linux-thread-problems.html

And you are relating this to a mechanism that has been known to be fundamentally flawed, in precisely this way, since the 1980s.

Smurfix · on Feb 15, 2018

You mean, like systemd?

wahern · on Feb 15, 2018

systemd still has a race condition when handling forking servers. There's no way to atomically send a signal to a cgroup, so what systemd does is read the PIDs in the cgroup and then iteratively send SIGKILL to each one. However, between reading the PID and sending the signal is the classic PID file race.

There is a way to atomically send a signal to a traditional process group, however. What I do for my daemons--at least, those which create subprocesses--is have the master become a new session and process group leader using setsid, open a master/slave PTY pair, and assign the new PTY as the controlling terminal. Child processes inherit the controlling terminal from the master, and if the master ever dies then _all_ children with the controlling terminal and process group will atomically get SIGHUP. As long as your subprocesses aren't actively trying to subvert you, it's bullet-proof behavior and more robust than any hack using cgroups.

There's still the issue of figuring out how to kill the master process. Ideally the master process never forks away from, e.g., systemd. (Not sure if systemd will try to kill it directly, first, before relying on its cgroups hack. Also, sometimes becoming a session leader requires forking if, e.g., the invoker already made you a process group leader.) But if the master must be independent, the best way is for the master to be super simple and just have it open a unix domain socket to take start/stop commands.

But let's presume we want a failsafe method in case the master has some sort of bug and we need to send it SIGKILL. (This is what I always assume, actually.) No matter what you do there'll always be a race. However, the least racy way to get the PID using a PID file is using POSIX fcntl locks. fcntl locks provide a way to query the PID of the process holding a lock (no reading or writing of files involved; just a kernel syscall). Importantly, if the process dies it no longer holds the lock and so querying the owner cannot return a stale PID. So when I use a traditional "PID file", I don't write the PID to the file, I just have the master lock it. There's still the race between querying the pid and sending a signal, but at least you're not leaving a loaded gun around (i.e. a PID file with a stale PID written to it).

This method is no worse than systemd and arguably better in some respects.

Oddly, I don't think systemd even bothers with process groups. That's a shame because it's really the only race-free to kill a bunch of related processes. systemd could provide the option to spawn a service within a process group and to send SIGKILL to the process group first before resorting to the cgroups hack to pick up any stragglers (i.e. those that intentionally left the process group). It could even provide the controlling terminal trick as an option. But it doesn't. AFAIK it just use the imperfect cgroups hack.

quotemstr · on Feb 15, 2018

Right. I implemented the same controlling pty approach in Buck. It's amazing how quickly forget robust older solutions and jump immediately to the new hotness.

twic · on Feb 16, 2018

I assume this was in response to "Really, we should just use a proper process manager, but that probably won't happen soon".

Yes, like systemd. Our servers even run systemd already!

Sadly, we don't have root on our servers, and user-level service support is broken on CentOS 7:

https://bugs.centos.org/view.php?id=8767

Maybe i could install another process manager of my own, and start it from cron. That really doesn't sound like fun, though. Is there a simple process manager which i can install and use without root privileges, and which won't make me restructure my whole deployment process?

adrianratnapala · on Feb 14, 2018

So then it is not correct to say a semantic patch is a lint rule. Rather semantic patches are both tools that attack the same problem from different angles. Arguably semantic patches are far superior if they can really be made to work.

quotemstr · on Feb 15, 2018

Linux uses Coccinelle (the semantic patch program) as a lint engine --- a job for which it's surprisingly well-suited. Try this:

  make coccicheck

blinkingled · on Feb 14, 2018

The Linux kernel treats threads no different than processes that share an address space which is unusual in UNIX land.

Semantic patch probably refers to coccinelle - look it up, it's pretty cool, makes it easy to deal with API changes, renames etc.

eBPF comes from extended Berkeley Packet Filter - where packet filter code is JITed to be fast. It's also used for dtrace like tracing along with kprobes to run profiling code in kernel safely.

caf · on Feb 14, 2018

The Linux kernel treats threads no different than processes that share an address space which is unusual in UNIX land.

They also share a signal handler table, open file table and current working directory. If a process-directed signal is sent to the process any of the threads with the signal unblocked can handle it. If a default signal disposition causes one thread to exit (eg SIGKILL or SIGSEGV) then all threads in the process exit. The exit_group(2) syscall causes all threads to exit (which is used to implement the POSIX exit() libc function).

Really, "threads mostly are just processes" hasn't been true for a very long time. Mostly the only ways in which that is still true is in scheduling, where Linux schedules each thread independently (what POSIX calls PTHREAD_SCOPE_SYSTEM) and in credentials, where threads each have their own set of current, real and effective uids / gids (and glibc has to use a signal-based hack to synchronise them).

blinkingled · on Feb 15, 2018

The point about the quote was when Solaris and BSDs were all doing special treatment of threads (LWPs/LKWT) for M:N threading, Linux since day one started with treating threads just like processes. The changes to special treat threads to solve problems (clean up manager threads hackery, signal handling improvments, clone syscall improvements, PID semantics etc.) came in later with NPTL as you said but by that time Solaris 9 had already adopted a 1:1 threading model.

d33 · on Feb 14, 2018

"Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task."

https://stackoverflow.com/a/809049

"semantic patches" probably refer to Coccinelle:

http://coccinelle.lip6.fr/

eBPF is extended Berkeley Packet Filter":

https://en.wikipedia.org/wiki/eBPF