What is loadavg (load average) in Linux when you run uptime?

I searched online for answers, but none of them are good enough for me --- most of them says it's equal to running queue, but some observations on an I/O busy NFS server disagree with that. So I checked source code.

TLDR;

load average can be considered as a sum of total running processes (the run queue size) and interruptible process over specific time period. This applies to at least version 2.6.x and 3.x (I only checked source code of those two versions)

BTW, the uptime command is only read and output the data in /proc/loadavg for load average.

(A little bit) Long but official explanation:

copied from Linux kernel source code: of v3.14-rc6

/*
 * Global load-average calculations
 *
 * We take a distributed and async approach to calculating the global load-avg
 * in order to minimize overhead.
 *
 * The global load average is an exponentially decaying average of nr_running +
 * nr_uninterruptible.
 *
 * Once every LOAD_FREQ:
 *
 *   nr_active = 0;
 *   for_each_possible_cpu(cpu)
 * nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
 *
 *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
 *
 * Due to a number of reasons the above turns in the mess below:
 *
 *  - for_each_possible_cpu() is prohibitively expensive on machines with
 *    serious number of cpus, therefore we need to take a distributed approach
 *    to calculating nr_active.
 *
 *        \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
 *                      = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
 *
 *    So assuming nr_active := 0 when we start out -- true per definition, we
 *    can simply take per-cpu deltas and fold those into a global accumulate
 *    to obtain the same result. See calc_load_fold_active().
 *
 *    Furthermore, in order to avoid synchronizing all per-cpu delta folding
 *    across the machine, we assume 10 ticks is sufficient time for every
 *    cpu to have completed this task.
 *
 *    This places an upper-bound on the IRQ-off latency of the machine. Then
 *    again, being late doesn't loose the delta, just wrecks the sample.
 *
 *  - cpu_rq()->nr_uninterruptible isn't accurately tracked per-cpu because
 *    this would add another cross-cpu cacheline miss and atomic operation
 *    to the wakeup path. Instead we increment on whatever cpu the task ran
 *    when it went into uninterruptible state and decrement on whatever cpu
 *    did the wakeup. This means that only the sum of nr_uninterruptible over
 *    all cpus yields the correct result.
 *
 *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.
 */

Calculation is done by below formula:

active = sum of running and interruptible processes

then: 

active = active > 0 ? active * FIXED_1 : 0;

avenrun[0] = calc_load(avenrun[0], EXP_1, active);  /* 1 min loadavg */
avenrun[1] = calc_load(avenrun[1], EXP_5, active);  /* 5 min loadavg */

avenrun[2] = calc_load(avenrun[2], EXP_15, active); 
/* 10 min loadavg */


on 3.x:

static unsigned long
calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
    load *= exp;
    load += active * (FIXED_1 - exp);
    load += 1UL << (FSHIFT - 1);
    return load >> FSHIFT;
}


on 2.6.x:                                                                                                                                                                                                    

static unsigned long
calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
    load *= exp;                                                                                                     
    load += active * (FIXED_1 - exp);
    return load >> FSHIFT;
}

and constants for both versions are defined as:

#define FSHIFT          11              /* nr of bits of precision */                                                
#define FIXED_1         (1<<FSHIFT)     /* 1.0 as fixed-point */
#define LOAD_FREQ       (5*HZ+1)        /* 5 sec intervals */
#define EXP_1           1884            /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5           2014            /* 1/exp(5sec/5min) */
#define EXP_15          2037            /* 1/exp(5sec/15min) */

mingbo's tech tips

Search This Blog

What is loadavg (load average) in Linux when you run uptime?

Comments

Post a Comment

Popular posts from this blog

enable special character support in Graphite metric name

How to send command / input to multiple Putty window simultaneously

easily convert RSA SecurID Software Token URL between iPhone and Andriod