UNIX Load Average Part 1: How It Works

Erwin · on April 5, 2010

I prefer looking at output of "sar" which shows you nicely, in 10-minute increments, how idle the system was today (and I think usually this data captured is rotated daily for a month), and gives you also a good idea of whether you have processes waiting excessively for IO.

It also has a bunch of other options.

On my own system, I also generally run "ps auwx" every 15 minutes, together with a scan of what queries Postgres servers are doing and dump of web request activity (read last 10k lines from access logs, find out how long ago the first request was to determine rough hits per second and ares of application they hit). That way when someone says "hey, the system was slow around this time" I can go back and find out that some cron job had a dozen processes taking up tons of memory or blocking on IO.

Some of those statistics also go into some RRD-based system which makes it easier to follow e.g. number of users logged in or number of Apache children based on weekday/time of day.

spudlyo · on April 5, 2010

I'm a big fan of 'sar' as well. It's nice that it can also show you i/o wait for the sampled period. I also love vmstat, as you can use it to see everything that is happening with the system sampled every second if you like. The first two columns will show you the number of processes in the run queue as well as number of processes blocked on i/o.

nailer · on April 5, 2010

sar gets its info from /proc/loadavg (on Linux OSs), just like top and uptime do, which is produced using the same code the article shows.

Erwin · on April 6, 2010

We must be running different versions of sar then, as "sar" by itself here (RHEL 5) shows information about the time split between user/system/waitIO/idle -- that certainly does not come from /proc/loadavg.

If you run "sar -q" you could get the load average information, but that's not particularly useful, as you can't see whether the 20 load avg an hour ago was caused by heavy disk IO or a dozen CPU bound processes.

nailer · on April 6, 2010

Nope, we're likely running the same version. The particular info you mentioned comes from /proc/stat (you're right that it's a different file), but again it's the same sources as top:

    # lsb_release -d
    Description:    Red Hat Enterprise Linux Server release 5.3 (Tikanga)

    # rpm -qf $(which sar)
    sysstat-7.0.2-3.el5

    # strace /usr/lib64/sa/sa1 1 1 &> results

    # grep open results                      
    open("/etc/ld.so.cache", O_RDONLY)      = 3
    open("/lib64/libtermcap.so.2", O_RDONLY) = 3
    open("/lib64/libdl.so.2", O_RDONLY)     = 3
    open("/lib64/libc.so.6", O_RDONLY)      = 3
    open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
    open("/proc/meminfo", O_RDONLY)         = 3
    open("/usr/lib64/sa/sa1", O_RDONLY)     = 3
    open("/etc/ld.so.cache", O_RDONLY)      = 3
    open("/lib64/libc.so.6", O_RDONLY)      = 3
    open("/etc/localtime", O_RDONLY)        = 3
    open("/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
    open("/proc/tty/driver/serial", O_RDONLY) = 3
    open("/proc/interrupts", O_RDONLY)      = 3
    open("/proc/net/dev", O_RDONLY)         = 3
    open("/proc/diskstats", O_RDONLY)       = 3
    open("/var/log/sa/sa06", O_RDWR|O_APPEND) = 3
    open("/proc/stat", O_RDONLY)            = 4
    open("/proc/meminfo", O_RDONLY)         = 4
    open("/proc/loadavg", O_RDONLY)         = 4
    open("/proc/vmstat", O_RDONLY)          = 4
    open("/proc/sys/fs/dentry-state", O_RDONLY) = 4
    open("/proc/sys/fs/file-nr", O_RDONLY)  = 4
    open("/proc/sys/fs/inode-state", O_RDONLY) = 4
    open("/proc/sys/fs/super-max", O_RDONLY) = -1 ENOENT (No such file or directory)
    open("/proc/sys/fs/dquot-max", O_RDONLY) = -1 ENOENT (No such file or directory)
    open("/proc/sys/kernel/rtsig-max", O_RDONLY) = -1 ENOENT (No such file or directory)
    open("/proc/net/sockstat", O_RDONLY)    = 4
    open("/proc/net/rpc/nfs", O_RDONLY)     = -1 ENOENT (No such file or directory)
    open("/proc/net/rpc/nfsd", O_RDONLY)    = -1 ENOENT (No such file or directory)
    open("/proc/diskstats", O_RDONLY)       = 4
    open("/proc/tty/driver/serial", O_RDONLY) = 4
    open("/proc/interrupts", O_RDONLY)      = 4
    open("/proc/net/dev", O_RDONLY)         = 4

    # strace top -n -b 1 &> results
    # grep open results 
    open("/etc/ld.so.cache", O_RDONLY)      = 3
    open("/lib64/libproc-3.2.7.so", O_RDONLY) = 3
    open("/usr/lib64/libncurses.so.5", O_RDONLY) = 3
    open("/lib64/libc.so.6", O_RDONLY)      = 3
    open("/lib64/libdl.so.2", O_RDONLY)     = 3
    open("/proc/stat", O_RDONLY)            = 3
    open("/proc/sys/kernel/pid_max", O_RDONLY) = 3
    open("/etc/toprc", O_RDONLY)            = -1 ENOENT (No such file or directory)
    open("/root/.toprc", O_RDONLY)          = -1 ENOENT (No such file or directory)

sophacles · on April 5, 2010

If you don't already know about it, dstat is a good one too, and lightweight enough that it can be run on heavily loaded production systems w/out as much "top problem" going on.

julio_the_squid · on April 5, 2010

This is the second or third article I've read explaining load average, and sad to say I still can't explain it.

All I know is that when it's inexplicably over 3-4, you can't determine why as (no processes are using high CPU), and 1 in ten database queries are taking to 5 minutes, under normal load, that day will not be the best of your life. Well, what I've determined is that a full storage device, or overloaded i/o system for your disk, can produce very high load along with accompanying performance of doom.

nailer · on April 5, 2010

> All I know is that when it's inexplicably over 3-4

Depends on the box. Say you:

a) have an app that spawns worker threads on demand and doesn't have a limit

b) Have a modern, cheap Nehalem X5600 CPU.

2 sockets * 6 cores * 2 threads per socket mean you're only fully realizing the investment when you have a load average of 24 (assuming that each of these cores is 0% idle, which you can tell with eg top).

High IO would show up as high %system time vs %user.

nailer · on April 6, 2010

'Threads per socket' should read 'threads per core' (long day).

btilly · on April 6, 2010

That happens when you have a problem where I/O somewhere, so lots of processes wind up blocked on I/O and are counted to load even though they aren't counting for CPU. The fact that normal queries take 5 minutes is confirmation of this.

As you've figured out this can happen because of a full storage device (causing writes to fragment as it squeezes data into small empty spaces, which causes what could have been one write to be many, which adds lots of seek time) or overloaded I/O causing the pre-fetch/caching tricks to stop working, causing everything to slow down.

The best solution is to never get into this situation in the first place. The second best solution is to find a way to halt I/O to the afflicted device until the system gets sorted out, figure out what was wrong, then add load and get back to a running state. The worst solution is to let things limp along until diminished load hopefully lets the disk get through its backlog and go back to normal.

andrewvc · on April 5, 2010

I'd agree that load average just isn't intuitive. I personally just look at CPU, Memory, and IO usage separately.

lallysingh · on April 5, 2010

Are you talking about the measured load (as part of the load average) or a general state of the machine as 'heavily loaded'?

The short answer is: it should be smaller than the number of cpu cores you have, if the sort of tasks you're running are actually using the cpu (vs just dispatching I/O, for example). A 24-core machine with load averages in their teens is still pretty responsive. Then again, considering that the right definition should specifically say that it's the number of processes in the 'Runnable' state.

It's a measurement of the CPU as a performance bottleneck. Other bottlenecks (e.g. your disk(s)) aren't covered.

spudlyo · on April 5, 2010

It's a measurement of the CPU as a performance bottleneck. Other bottlenecks (e.g. your disk(s)) aren't covered.

I'm afraid that's incorrect. Any process blocked on disk i/o is considered as part of the load average.

lallysingh · on April 5, 2010

What? Ugh. That's depressing.

sabat · on April 6, 2010

To some degree, it's voodoo. When you deal with machines of a particular OS every day, you get a feel for what the numbers mean.

you can't determine why as (no processes are using high CPU)

Roughly, the load avg is the length of the run queue (how many processes are waiting to be run or are being run, vs. how many processor cores are available) with some IO activity thrown in for good measure. It could be a lot of IO wait or a lot of small processes that are driving up the load avg -- not necessarily just one process trying to hog CPU time.

There are a lot of good suggestions here -- many sysadms are too reliant on load avg (myself somewhat included).

patrickgzill · on April 6, 2010

A couple notes:

1. Load averages are not directly comparable across different versions of Unix's such as FreeBSD, Linux, Solaris, etc.

2. Load average is a good way to quickly check if there is anything else to look at.

However, a far better series of tools is sar plus iostat , vmstat etc. which should help you more quickly determine whether your problem is CPU, disk, or network IO .

mhd · on April 5, 2010

So, what about threads? I've seen Linux versions where they all had the same load as the base process, resulting in a huge load in a multi-threaded server application.

spudlyo · on April 5, 2010

In Linux, running threads that are shown in 'top' (with thread mode 'H' on) to be in state 'R' or 'D' are counted in the load average. You'll definitely see this in programs with a ton of threads, like MySQL or the Java JVM.

If you're not in thread mode in top, this is why you will sometimes see a process consuming > 100% CPU usage.

petercooper · on April 5, 2010

Unnecessarily rewritten headline.

pinko · on April 5, 2010

As far as I can tell, this article doesn't mention the most frequent problem people have comprehending high load averages on Linux: the inclusion of processes in uninterruptible sleep. This can result in a system with virtually no CPU load reporting a very high load average. (And, IMHO, makes the load average much less useful a metric than a pure running/run-queue-based method.)

barlo · on April 5, 2010

Awesome article with great detail and explanation. I've always been curious about how load averages are calculated and what exactly they mean.

spudlyo · on April 5, 2010

tl;dr version:

The average number of threads/processes in the run queue and/or blocked on disk i/o sampled at 1, 5, and 15 minutes.

pak · on April 6, 2010

How about an article about measuring memory usage?

From what I've read, measuring the actual total memory consumption of an individual process is nontrivial on both Mac OSX and Linux, because of the way the stats are generated for things like ps, top, etc. and the way both kernels share memory between processes whenever possible.

geoffc · on April 5, 2010

My rule of thumb is that the end users of a web application will perceive that the system is "slow" if the 1 minute load average is above 4 on the application or database server.

seiji · on April 5, 2010

Huh? What if I have a 48 core AMD server? A fully CPU-bound load average of 48 would be great.

Load average can tell you if the system is under-used, but it can't tell you how the system is over-used.

I regularly have a few dual core systems spike to load averages of 50+ because of suboptimal NFS mounts. The NFS issue keeps processes waiting around with nothing to do for a while. Those processes are still counted towards the "load" even though they have no CPU activity (they are blocked on IO).

FlorinAndrei · on April 5, 2010

Always divide the load average by the number of cores.

Maybe we should re-define load average, as the old definition divided by the number of cores.

geoffc · on April 5, 2010

Agreed it is not accurate for huge servers and IO bound cases but I have found it to be a good rule of thumb for a LAMP web application stack on anything from single to 8 core machines.

It is a useful metric to start digging deeper into the machine.

spudlyo · on April 5, 2010

The Linux kernel also checks to see if there are any tasks in a short-term sleep state called TASK_UNINTERRUPTIBLE. If there are, they are also included in the load average sample.

Shown in top as being in the 'D' state.