Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN - Why was there ever, fork() ?
35 points by billpg on Oct 6, 2009 | hide | past | favorite | 58 comments
Hi everyone. I'd like to please figure out something that I don't quite understand. Why did we ever have the fork function call?

Back in my youth when I was first taught the fork call, I got how how it worked, two copies of the process return from the function call and they both continue in parallel. The only thing that was bothering me was the long list of caveats my text book discussed.

It told me that a copy of the process was made, except for file handles, and that the memory wasn't really copied until one of the two processes tried to modify something. It seemed all terribly complicated but I figured there was a good reason for it that I didn't yet understand, grasshopper.

Time passed and I started working in embedded systems and later coding for Windows. I never used fork beyond those juvenilia programs I made. These OSs started new processes by passing an executable filename to the OS and telling it to start a new program. That new process started with a clean slate, no memory, empty stack, no open file handles except stdin/out/err. Simple.

Now, I've just been reminded of the fork call in Unix, and I'm prompted to ask; Why was it ever there? Who wants the ability to do a fork when simpler ways of starting a new process exist.

Nearly all the uses of fork I've seen are usually followed by an exec call. So the OS goes to all that trouble setting up a duplicate process only for all that hard work to be eliminated by running exec.

Even when concurrency within a program is needed, the thread model seems far more useful with a lot less complications.

So please, I need to know, why fork?



Because a 'fork' is the most natural way for one process to transform itself into another process while continuing to run by itself.

I actually think it is one of the most elegant system calls in unix.

Think of all the alternative clunky ways that OS's before unix had to use to start a process at a given depth into the process. Lots of flags to make sure that you started off where you left in the 'parent', to recreate all or most of the state required for the child process. Fork passes all that state 'for free'. And copy-on-write makes it fast.

It's a bit like biology. Split the cell, then let them both specialize a bit towards what they have to become. The moment of splitting is almost 100% symmetrical, the only difference being who is the 'parent' and who is the 'child' process.

Other ways of starting new processes feel clunky in comparision, you have to specify a binary to run, you have to know all kinds of details about parameters to pass and so on.

Fork essentially abstracts the creation of a sub-process to the absolute minimum.

Fork is atomic, it's got 0 parameters and it returns only one integer (or it fails for a simple reason, no more process slots).


Needing an exact copy of the current process is an atypical use case in my experience. And the typical use case of running another binary isn't made any more elegant by fork(), it just shuffles the complexity into exec().


Here's where using fork() without an ensuing exec() is very natural ...

A server process that must reply to hundreds of network requests per second but still remain single-threaded pretty much needs fork(). The persistent parent process sets up a common configuration, accepts incoming network requests, and calls fork() (which is very fast) to do the real work for each request in a child process.

That child's work involves looking up relevant data or recording relevant data or doing a transaction that may involve only local memory, the filesystem, a DB, other processes, or other network services. The child must also reply to the network client. Doing this all in the single-threaded parent would preclude it from handling other requests and make it impossible to respond to hundreds of clients (unless the parent uses complicated asynchronous processing and the per-request work is mostly I/O and low on CPU ... but that complicates the server process). Starting a separate process per client instead (as in CreateProcess() or spawn()/fork-exec()) complicates the architecture and is very expensive because all info needed for the client reply can often be housed in the original parent process and inherited by the child (e.g. locally derived/cached DNS results). fork() leads to the simplest and most efficient architecture.


the simplest and most efficient architecture

Actually I think in most cases the simplest and most efficient architecture is when your server creates a process for each specific type of thing it needs to do and then simply delegates incoming requests. http://www.okws.org/doku.php?id=okws is a good example.


Ah, but now it's a lot harder to share anything that isn't simply coming out of a shared library; pre-cached computations, any code that isn't in a shared library (like Perl or other interpreted code), any expensive computation that must be done per-process, etc. Forking off children is virtually impossible to beat on the efficiency front; people tend to grossly overestimate the expense of a "fork".

The only way to win with the approach you suggest is if you have some sort of massively complicated server that you only ever need some small part of at any given point in time, allowing you to do a lot of swapping in and out of memory. I've never seen such a beast and can't really come up with a non-contrived use case. YMMV, but it certainly isn't a common case.

(Yes, Perl isn't exactly interpreted, but from this point of view it certainly isn't a shared binary library.)


Although I believe you're wrong about the actual utility of this kind of data sharing across processes (and also making exaggerating claims about complexity), I might change my mind if you provided a specific real-world reference example.


That's because you want to do an exec, which essentially transforms a running process into another process. I think people generally overestimate the complexity of fork on the part of the kernel, it's not all that bad. Create a new process slot, copy the VM configuration, set the 'copy-on-write' bit and the 'write-protect' bit on the pages in the memory image of the parent and return (twice!).

If all you want is a subprocess to do something interesting to the data you already have then fork is ideal. This is - or maybe was - a very common use case.

If you also need exec then why not first abstract out the fork, you need that any way, and do the exec in the child. It's one of the most elegant solutions to the problem I've seen.

Otherwise you get a whole bunch of functional duplication between system calls.

Already the 'exec' zoo is a good example of the kind of functional duplication you'd get. Adding 'forking' and 'non-forking' versions would not seem to improve matters much.


You make it sound so smooth and slick, and yes, having the child have all its state all ready and set up sounds like a good thing.

I suspect you are talking of using concurrency within the same program. The competing technology would be shared memory threads, of which I am more familiar. The only downside of this approach compared to fork seems (to me) to be the requirement to lock and signal when accessing shared resources.

Is there a downside to using fork within a process? Shirley if you've got a copy of memory (even if the actual copy is defferred) don't you also inherit the baggage of managing those resources too?

Lotsa questions there. Thanks for your original candid answer.


One thing to remember: when fork() was invented, shared-memory threads basically didn't exist. Everything that would be done with threads today, had to be done with either fork() or asynchronous event loops, because no mainstream OS had threading.

Another thing to remember: there's essentially no difference between a fork() and a thread. On Linux, they're both implemented by the same clone() syscall -- they just specify different flags about what should be shared versus what should be split and copied. Both are extremely cheap, and if you're going to implement threading, you might as well implement fork() because [on hardware with virtual memory] it's almost free.


> I suspect you are talking of using concurrency within the same program.

That's one common use case for 'oldies' :) But there are some others, such as splitting a protocol handler into two layers, all the setup including environment, privileges and so on is the same, then the two layers of the protocol split and start communicating via a pipe. Now it's technically two programs but they happen to live in a single binary image.

> The competing technology would be shared memory threads, of which I am more familiar.

When unix was first developed shared memory was not in the cards, neither were threads. Unix is old, and fork is definitely showing how old. The initial machine that unix was developed on was a PDP-7, here are the specs:

18 bit words

4 K memory standard (magnetic cores, small ferrite beads with a bunch of wires running through them for addressing and a single read/write wire (ok, that's simplified))

16 K words for the machine Unix was written on originally (max was 64K words)

discrete parts only (no integration, just Q,D & R (and some 'C' ;) )

I'm not 100% sure if they already had 'fork' in place on that machine, but the next step up, the PDP-11 definitely had it (that's also the machine that C was born on, the first version of unix was written in assembler!).

> The only downside of this approach compared to fork seems (to me) to be the requirement to lock and signal when accessing shared resources.

They're completely different beasts I think. Fork is a way to guarantee a whole slew of things about the relationship between two processes, threads are essentially parallel executions within the same memory space (or on a single core machine a simulation of parallel execution).

Threads are both a great thing and a nightmare, depending on whether you've just solved the worst race condition bug in your career or whether you're still busy with it :)

Your 'lock and signal' sounds pretty simple, and in theory it is, in practice efficient multi-threading programs are amongst the hardest things to get right that I know of.

> Is there a downside to using fork within a process?

If you need it, you need it. If not don't use it. Like any other tool. Downsides to system calls are not really interesting, they only appear when you use the system calls in ways that they weren't meant to. What's in the core of a unix system is pretty much what needs to be there.

> don't you also inherit the baggage of managing those resources too?

Yes, absolutely. Any memory that was allocated in the parent is also allocated in the child, any file system handles that you have are present in both and so on.

In the case of file system handles there is usually some kind of convention on who gets to 'own' them but you can do interesting things by letting them be owned by both.


Forking predates threads. It's also makes more sense as a system call, since a library writer can implement spawn() in terms of fork() and exec() but not vice versa.

It's also still a good model when you want to run N copies of the same code concurrently, since the processes are isolated from each other, making it easier to reason about correctness. There are some wrinkles (primarily due to inherited global state such as filehandles), but they're reasonably well understood (by Unix+C programmers).

If you need significant shared state between concurrent paths of execution in the same code, then threads are probably easier.


> Forking predates threads.

Suddenly, everything becomes clear.


Maybe clearer than it should be then :) You would still need something like fork, even if unix would have had threads from day 1. It's just that we wouldn't have been using fork+communication to simulate multi-threading in those cases where multi-threading are more appropriate.


On a related note, the Plan 9 system call rfork() gives a system call by which you can create new processes or lightweight processes(threads) with a single system call, deciding on what resources are to be shared between the new processes. There are no 2 discrete entities called processes and threads, just processes !

"In Plan 9, fork is not a system call, but a special version of the true system call, rfork (resource fork) which has an argument consisting of a bit vector that defines how the various resources belonging to the parent will be transferred to the child. Rather than having processes and threads as two distinct things in the system, then, Plan 9 provides a general process-creation primitive that permits the creation of processes of all weights." - Rob Pike.

You can read more about it here - http://groups.google.com/group/comp.os.research/browse_threa...


That's actually how fork() is implemented today on Linux, too - the clone() system call can be used to implement either processes or threads (by specifying exactly what is to be copied and what is to be shared), and modern versions of glibc implement fork() by calling clone().


Plan 9 still is in many ways the most interesting thing to happen to the world of systems design after unix.

It deserves more attention.


So does the widely known OS called linux. man clone


Yea but I had heard somewhere that the linux clone was based on Plan 9 rfork.

"FreeBSD's rfork(2) and Linux's clone(2) system calls are modeled on Plan 9's rfork(2)"

source - http://catb.org/esr/writings/taoup/html/plan9.html


fork() is commonly used where you want to communicate to the sub-process using a pipe. Given how common piping from one process to another is in Unix it's not surprising that fork() was implemented (nor that fork then exec was common).

It's true that Windows tends not to use this paradigm, but it is common in the Unix world specifically because of the simple ability to share file handles between a parent and child process. And also the parent and child processes share most everything else (for example, they have the same environment settings).


fork() is great when you're writing a service. A pattern that I've used repeatedly is:

- The initial process reads in configuration files, sets up an environment, and opens a listening socket for the service. It then forks several times to create service processes. From this point, the initial process' only job is start new service processes when/if they exit or when load increases, and to shutdown the whole service when told to.

- The service processes run in a loop waiting for requests to come in on the socket, which they all share. The service can handle as many concurrent requests as you've got service processes, and they all operate independently. Thanks to copy-on-write, they all access the same configuration information stored in the initial process' memory. When a request comes in, the service process accepts it (which creates a new socket) and does some initial sanity checking to make sure it's a valid request, and then forks to create a handler process to actually process the request. It then goes back to listening for requests.

- The handler process is the workhorse. It gets the connection socket from its parent, and it's still got access to all of the config info. It's an independent process, so it's free to do whatever it needs to, without risk of impacting the continued operation of the service. Once it's done handling the request it can simply exit, freeing up whatever resources it consumed while handling the request.

In this pattern, the initial process and service processes have very simple jobs and very little code, which makes them easier to make bug-free and robust. Having lots of independent processes instead of threads adds robustness, because a crashing process can't take down the other processes in the service (unless it takes the whole machine down, of course.) This is rarely a problem in the initial or service processes, but the handler processes are exposed to the world and are much more likely to encounter unanticipated input, so they're the hardest to make robust. With the pattern, they don't need to be as robust, because they're allowed to exit unexpectedly without harming the service.


That right there, the perfect use case for fork. Spinning a task off to do its thing in isolation.


fork() is strictly superior to an API like Win32 CreateProcess(), because it can do more with less.

Processes normally inherit lots of context from their parent: the user identity, the window station (Win32-speak), security capabilities, I/O handles / console, environment variables, current directory, etc. The most logical way to inherit everything is to make a logical copy, which is very cheap owing to memory architecture.

Because of this things that would normally need two APIs, one synchronous and one asynchronous, can be programmed easily. If you need the synchronous version, call it directly; otherwise, fork and call it, and wait on the pid (at a later point) if necessary in the parent.

And I rather vehemently disagree with you saying that the threading model has less complications than the process model. I believe there's almost universal agreement that the problem with threading is mutable shared state, and the process model avoids it.


To see some of the hoops that need to jumped through to emulate fork() on systems that don't have it, and the limitations of doing so, check out the perlfork man page.

http://perldoc.perl.org/perlfork.html


fork() and exec() work well as separate system calls for the common situation where the child (but not the parent) needs to adjust something before executing the new program. Changing file descriptors to implement > and < in the shell, for example. It's common to see sequences like

  pid = fork();
  if(pid == 0){
    close(1);
    dup(pipefds[1]);
    exec(...);
  }


Aye. This way, we can control a large number of aspects of the child's environment in which execve() is called, without having to have execve() do all that work for us. We can open files, change the session id, reparent the process to init, alter environment variables, lower process limits, change credentials, change the root directory... the possibilities are legion.

You wouldn't want to have to design a way to pass all of those things to execve(), would you?


An interesting question. I don't know of the design decisions or whether the fork idea predated UNIX. But to me, it's just the sheer simplicity. Compare to the basic process creation function on Windows, taking 10 arguments: http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx

Another useful application of the implicit descriptor sharing is http://en.wikipedia.org/wiki/Privilege_separation


Except fork() doesn't do what CreateProcess() does. I'd actually be pretty interested to see a fair comparison: a list of Unix system calls, with arguments, that covers all the functionality of CreateProcess(). Are you completely sure that it'll be smaller and more elegant?


Well, it is certainly the UNIX way. Have many simple tools that do simple stuff, plus the ability to combine them for doing complex operations, rather than few complex tools that do complex operations directly.

I'd say it is more elegant design, whether or not it results in smaller or more elegant code.


If you reimplement Windows in Unix, it's unlikely to be more elegant. The question is if that would be a good idea.


> So the OS goes to all that trouble setting up a duplicate process only for all that hard work to be eliminated by running exec.

That's just it, creating a process that's an exact copy is the path of least resistance. Due to the way the VM system works in most modern hardware, it's much cheaper to create an exact copy of a virtual address space (you're just copying TLB entries) than it is to create a brand new one.


Samba uses fork to create new instances of itself to handle each connected user, and I am often grateful that it does that instead of using threads. Since each user has a separate process, something going wrong inside the process means that the process dumps core, but none of the other users ever notice. Even the user whose smbd process crashed doesn't notice much except a brief delay while he reconnects.

No real need to monitor (except to try to catch the bug that caused the crash in the first place), and no need to manually restart anything. It all just keeps going.

Basically, a server process using fork has a lot of resilience built in. In contrast, a crash in a threaded process will kill all the threads at once, and all users feel the pain.


fork is really handy when you actually want multiple copies of your program (sharing only initial state) -- it can be used like creating a working thread, but you don't have to worry about sharing (subsequent) state. This means you can exploit multiple cores without the complexity (and bugs) of shared memory threads.

Actually, fork(2) has now evolved into clone(2) on Linux, so you can choose in quite a fine grained way what the threads/processes will share.

The separation of the functionality of spawn between fork and exec is surprisingly handy (even though people occasionally still come up with vfork(2)).


Here is a use case... We have a shell that must execute

  $ cmd_a | cmd_b | cmd_c
The simplest way for the shell to accomplish this request is to fork itself multiple times. Doing so without fork would be difficult. I figure since multitasking and pipes are old as eternity in the linux world, fork must have been an early necessity and this use case might have something to do with their prominence but then again I am just guessing.


If you'll excuse the pseudo code...

Pipe pipe1 = new Pipe() Pipe pipe2 = new Pipe() NewProcess("cmd_a", null, pipe1) NewProcess("cmd_b", pipe1, pipe2) NewProcess("cmd_c", pipe2, null)


You are not passing the current shell state though. So you could start 3 new shell processes with enough data to set the state right and then start the individual processes but that is both inelegant and inefficient.

Instead if we fork thrice and each fork execs a command and does the pipeline plumbing, all three processes start simultaneously and inherit the exact same shell state all for free. And copy on write means we did not waste any memory replicating the shell state for 3 processes.


Servers & Shells.

i.e. the things Unix systems are good at, but Windows systems are not.


Epic troll. Bravo.

But seriously: fork(2) is natural and mathematical. There is no IO involved. That is to say, when you call it, you don't have to activate some spinning mechanical thing and wait several million or billion cycles while it clatters and bumbles along, filling the higher caches with code and data.

fork is blazing fast; effectively an O(1) operation. It's just about as light-weight as process creation can get.

fork is useful. It allows one to manage complicated families of processes, complete with pre-fork and post-fork activity. Threads can't match it here. The only thing I can think of that surpasses the multiprocessing capability of a forking process is modern async IO. And then you have to implement all the management stuff by hand.

With all due respect, someone who claims to have embedded experience shouldn't have to ask hacker news about the benefits of fork, unless your embedded experience is all on Windows Mobile and its ilk, where CreateProcess rules the day.


> With all due respect,

"Epic troll. Bravo."

> someone who claims to have embedded experience shouldn't have to ask hacker news about the benefits of fork, unless your embedded experience is all on Windows Mobile and its ilk, where CreateProcess rules the day.

Why do I feel an urge to defend my experience to someone who openly insults me? I should just walk away.

(sigh)

I've worked on OS-less systems, where we have a short bit of assembly to pass control over to the "Main" function written in C. We implement concurrency by hooking into the timer-tick interrupt. I'd describe those as having two shared memory threads, one pre-emptive and one co-operative.

As well as that, I've used psos and vxworks. These have pre-emptively switched processes, but I'd describe them as threads as they share memory and have no protection between them. There's no memory management or virtual memory, load memory location 42 and 42 comes out of the address bus.


I'm not trying to insult, but on re-reading I guess one can see it that way, what with the troll remark and all. You have to admit your post had most of the effects your typical internet troll's would.

But on topic; your original post suggests that there are "simpler ways of starting a new process" and that using threads "seems far more useful with a lot less complications."

I think this is wrong on both counts. There is no simpler way to start a process, and using threads leads people towards manually reproducing many of the things fork provides for free, leading to more complicated and difficult to understand code.

I understand when the average programmer misunderstands fork, but systems programmers should know better. Since your experience is on the hardware level, and not operating systems, it makes more sense that you're not aware of the advantages of fork. But I still can not fathom what you consider to be more simple than fork. Perhaps your definition of process creation differs from mine, and most others? I'd like to understand more, in any case.


Nearly all the uses of fork I've seen are usually followed by an exec call. So the OS goes to all that trouble setting up a duplicate process only for all that hard work to be eliminated by running exec.

http://en.wikipedia.org/wiki/Copy-on-write


The OP was asking why we need to copy the state at all, not how to optimize it.


Because plenty of times what a child will do is dependent on what the parent was doing just before the fork, and in fact may simply be a bit of code to run a background task related to the parents foreground task.

Also, and this is a very important bit I think, fork started out before 'threads' were common, so another process to run the same code was a common solution. The communication between parent and child was through a unix pipe. That way you could write one single program, with all the state shared between the two sides of the fork, so both parties have access to all the context.

The copy-on-write bit set on all the pages with state in them in the child guarantees that fork is very fast and pushes the copying of the state as far in to the future as it can get away with. So forking a process with 10M resident is as fast as forking a process with only 100K resident. When you modify the memory in the child you get to pay 'bit-by-bit' for the cost of the cloning of the parent, but never more than you actually need.

Clever programmers make sure that the state variables that are going to be modified by the child live close to each other.

An alternative to that is to use shared memory and mutexes, that way you can get pretty close to the 'threaded' model using only processes.


well, page-by-page technically


Don't forget that fork predates not just threads but virtual memory so cloning all of the volatile memory of a process was really cheap (because there was only a few K of it). Look at the source code for fork in Lyons book on the version 6 kernel and you'll see how simple it used to be.


In the systems programming languages of old, fork is just easier than threading.

Fork model:

Step 1: Write a program to accept a single connection to a single TCP socket, then handle the request.

Step 2: Judiciously place a fork() call at the time of the new connection coming into the socket.

Step 3: Add an "if" statement to wait for another request if you happen to be the parent process after the fork.

You're done!

You just wrote a program capable of handling thousands of concurrent requests, with none of the concurrency nightmares that keep sensible men up at night. Going from the simplest case to the finished version was a two-line code change.


If the new task can work in isolation, then yes, fork seems ideal. If the tasks need to interact, then threads seem (to me) more useful.

I've written web services in the past with databases storing data (like most web applications do) and I've often wished that the potentially many processes could just be multiple threads in a single process instead, so I could have them just share an array of objects without the overhead of a database server.


So I've got a process running as root. I want it to spawn a new process running as user1, in /var/fred, with a pipe to stdin and direct stdout to /var/log/greg.

First I create the pipe, then call fork. In the child, I chdir to /var/fred, open /var/log/greg, run fdup2 on the pipe and on the handle to /var/log/greg, setuid to user1, and then finally call exec.

Show me an API that can do that without fork.

All the popen / spawn / system functions are not system calls but rather library functions which operate by calling fork.


> Show me an API that can do that without fork.

http://msdn.microsoft.com/en-us/library/ms682429%28VS.85%29....


Exactly the point - that call takes 11 parameters, many of which are complex structures themselves. Compare that to:

pid_t fork(void); int execve(const char filename, char const argv[], char *const envp[]);

The idea is to have several simpler system calls that you can wire together to get the complex effect you need, rather than trying to build an ultimate CreateProcess function that can handle any case of infinite complexity.


System 5 was process based. You start processes, monitor process, kill processes. Processes share data through message queues. You entire architecture is process based so you need a way to start processes.

Simple embedded system don't have processes or threads. The are just loops. More complex embedded systems are real-time oriented and will use threads as the locus of control because the whole memory space is shared amongst the threads. No need for processes at all.


The UNIX methodology is to have a large number of processes cooperating to provide functionality. "fork" is the fastest way to create a new process; so the reason for fork is to provide a system call that creates a process really fast.

Windows philosophy, on the other hand, is to have monolithic programs that solve everything by themselves. They infrequently need to start new processes, so fork is not viewed as important.


On the subject of fork() and exec I have used that in the past so that the parent and the child can share I/O - thus allowing the parent to monitor the exec'd program more closely.

In the end I gave it up as a lost job; whilst the general idea of fork() is appealing we found much "better" ways for fine grained process control.


I vaguely (possibly incorrectly) recall that fork is the only way to create a new process, and that, no matter what system call you use, deep down, it still needs to call fork().

Someone with a better memory may correct me.

I'm not sure if this holds true for Windows though.


no fork() in windows


fork() by itself - inelegant. fork() - do ARBITRARY stuff in child process - exec(), now that does all kinds of things that a spawn()-type process creation cannot do.

fork() allows the ability to do anything before exec(), setting up lighter-weight process creation, and whatever flexibility the programmer desires.

I'd turn it around: why do the designer's of spawn() or CreateProcess() think they've got the foresight to cover all of the bases for programmers? Why don't those systems do fork()/stuff/exec() to simplify?


Why is fork by itself inelegant ?

Text editor, hit 'save', editor forks, saves in the background and quietly exits, no matter how long the save will take. Meanwhile the user continues to type in more text in the foreground.

Just one example.


"fork() allows the ability to do anything before exec()..."

That is not entirely true. See the "CAVEATS" section of "man fork".


fork is a really elegant call.

Maybe you should be using higher level calls, but back when I was writing linux assembly programs fork was awesome.

There's not really that many caveats to using it at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: