Skip to main content

What is loadavg (load average) in Linux when you run uptime?

I searched online for answers, but none of them are good enough for me --- most of them says it's equal to running queue, but some observations on an I/O busy NFS server disagree with that. So I checked source code.

TLDR;

 load average can be considered as a sum of total running processes (the run queue size) and interruptible process over specific time period. This applies to at least version 2.6.x and 3.x (I only checked source code of those two versions)

BTW, the uptime command is only read and output the data in /proc/loadavg for load average.

(A little bit) Long but official explanation:

copied from Linux kernel source code: of v3.14-rc6

/*
 * Global load-average calculations
 *
 * We take a distributed and async approach to calculating the global load-avg
 * in order to minimize overhead.
 *
 * The global load average is an exponentially decaying average of nr_running +
 * nr_uninterruptible.
 *
 * Once every LOAD_FREQ:
 *
 *   nr_active = 0;
 *   for_each_possible_cpu(cpu)
 * nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
 *
 *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
 *
 * Due to a number of reasons the above turns in the mess below:
 *
 *  - for_each_possible_cpu() is prohibitively expensive on machines with
 *    serious number of cpus, therefore we need to take a distributed approach
 *    to calculating nr_active.
 *
 *        \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
 *                      = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
 *
 *    So assuming nr_active := 0 when we start out -- true per definition, we
 *    can simply take per-cpu deltas and fold those into a global accumulate
 *    to obtain the same result. See calc_load_fold_active().
 *
 *    Furthermore, in order to avoid synchronizing all per-cpu delta folding
 *    across the machine, we assume 10 ticks is sufficient time for every
 *    cpu to have completed this task.
 *
 *    This places an upper-bound on the IRQ-off latency of the machine. Then
 *    again, being late doesn't loose the delta, just wrecks the sample.
 *
 *  - cpu_rq()->nr_uninterruptible isn't accurately tracked per-cpu because
 *    this would add another cross-cpu cacheline miss and atomic operation
 *    to the wakeup path. Instead we increment on whatever cpu the task ran
 *    when it went into uninterruptible state and decrement on whatever cpu
 *    did the wakeup. This means that only the sum of nr_uninterruptible over
 *    all cpus yields the correct result.
 *
 *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.
 */


Calculation is done by below formula:

active = sum of running and interruptible processes

then: 

active = active > 0 ? active * FIXED_1 : 0;

avenrun[0] = calc_load(avenrun[0], EXP_1, active);  /* 1 min loadavg */
avenrun[1] = calc_load(avenrun[1], EXP_5, active);  /* 5 min loadavg */

avenrun[2] = calc_load(avenrun[2], EXP_15, active); 
/* 10 min loadavg */


on 3.x:

static unsigned long
calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
    load *= exp;
    load += active * (FIXED_1 - exp);
    load += 1UL << (FSHIFT - 1);
    return load >> FSHIFT;
}


on 2.6.x:                                                                                                                                                                                                    

static unsigned long
calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
    load *= exp;                                                                                                     
    load += active * (FIXED_1 - exp);
    return load >> FSHIFT;
}

and constants for both versions are defined as:

#define FSHIFT          11              /* nr of bits of precision */                                                
#define FIXED_1         (1<<FSHIFT)     /* 1.0 as fixed-point */
#define LOAD_FREQ       (5*HZ+1)        /* 5 sec intervals */
#define EXP_1           1884            /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5           2014            /* 1/exp(5sec/5min) */
#define EXP_15          2037            /* 1/exp(5sec/15min) */

 

Comments

Popular posts from this blog

How to send command / input to multiple Putty window simultaneously

Putty is one of the best and must-have freeware for people working on Linux/Unix but use Windows as client like me.  We need to manage many servers and sometimes we are hoping we can run/execute same command on multiple host at same time, or just input same thing to multiple host. I searched online for a tool can do this. And it looks like PuTTYCS (PuTTY Command Sender) is the only one existing. But I’m a little bit disappointing after tried the software, it’s good but not good enough. It can only send command to each window one by one, and you have to wait until last window got input. So I think I should do something, and puttyCluster was born ( https://github.com/mingbowan/puttyCluster ) interface is simple: When you input Windows title pattern in the text box, you will be prompt for how many windows matching the pattern, like this: and you click the edit box under “cluster input”, what ever key you pressed will pass to all those windows simultaneously, even “Ctrl-C”, “Esc” ...

enable special character support in Graphite metric name

Problem Graphite doesn’t support special characters like “ “ (empty space), “/” slash etc. Because it expect everything to be just ASCII to split/processing them, and then make directories based on metric name. For example:   Metric:     datacenter1.server1.app1.metric1.abc Will create datacenter1/server1/app1/metric1/abc.wsp But Metric: datacentter1.this is a test/with/path.app.test will fail when create directory So any special name not allow to appear in directory/file name is not supported by Graphite.   What we can do?   We can urlEncode the metric name which has special characters. So like “/var/opt” (not valid file name) will become “%2Fvar%2Fopt”(now valid), using urlEncode instead of others (like BASE64) is because this will keep most of data readable.   So what to change? 1. urlEncode metric name before send to Graphite (if you always sending metrics using text/line mode instead of pickle/batch mode, then you may consider modify ...