13. Mai 2011

Sampling from a Poisson distribution - a benchmark

Having a poisson-distributed random variable X, how can we efficiently generate samples (or say realizations) of that random variable? This article shows how Knuth’s technique can be derived and it also indicates that the Ratio-of-Uniforms method is quite efficient.

X is the number of arrivals in a poisson process within a fixed period of time. The inter-arrival times follow an exponential distribution. The following small benchmark makes an attempt to compare the time necessary to compute samples using three different algorithms:

  1. Algorithm exp, do it yourself
  2. Algorithm uni by D. Knuth
  3. Algorithm rou, from Numerical Recipes

(Let’s peek at the results up-front: rou seems to win this challenge!)

exp – counting inter-arrival times

We want a random sample of the number of arrivals within a fixed period of time. We know the distribution of the inter-arrival times. By summing up inter-arrival times until the sum exceeds the fixed period of time (use the unit interval for convenience) we generate a sample number of arrivals. Formalizing this idea, we have a sequence of exponentially distributed random values:

37376a61189a3d4861995c15ada16f893800cc4d

The i-th interval represents the time until the i-th arrival. We then find a number k such that

a21795030329a60e5871a38a2299aaf5d1f29c7d

This number k is the sought-after random sample.

double exponential(Ran *ran, double lambda) {
    double p;
    do {
        p = ran->doub();
    } while (p == 0.);
    return -log(p) / lambda;
}

int poisson_exp(Ran *ran, double lambda) {
    double t = 0;
    int k = 0;
    while (1) {
        t += exponential(ran, lambda);
        if (t > 1) {
            return k;
        } else {
            k++;
        }
    }
}

uni – counting inter-arrival times, improved

Usually, the inversion method is used for sampling from an exponential distribution.

3058f17208798b7a9cddef5597c471e0ffc084e9

Note the logarithm. It is the computationally most expensive operation when generating a sample inter-arrival time. Knuth’s variant gets rid of this nuisance. Here’s how it works

cb7ca50d4cdb5765b6a67cc037cf9b59265d1856

Finding a k that satisfies the last inequation only requires λ+1 uniform samples on average, reflecting in shortened computation times.

int poisson_uni(Ran *ran, double lambda) {
    int k = -1;
    double p = 1.;
    double lambda_exp = exp(-lambda);
    do {
        k++;
        p *= ran->doub();
    } while (p > lambda_exp);
    return k;
}

rou – using Ratio-of-Uniforms method

No explanation here (unfortunately…). Just take the algorithm from Numerical Recipes and see what it’s worth. Actually, I modified it to use Ratio-of-Uniforms for all values of λ for the sake of comparison. Sources are licensed and therefore cannot be listed here.

Benchmark

Taking different values for λ (1, 2, 4, 16, 32, 64) and generating a million random samples using each of the three algorithms on a notebook from 2008, here’s the results (interpolated).

 g++ -O2 benchmark.cpp -I . -o benchmark
./benchmark
 Numbers generated per algorithm: 1000000
Expected value: 1
    method       mean   time [s]
       uni      1.000      0.130
       exp      0.998      0.300
       rou      1.036      0.310
 Numbers generated per algorithm: 1000000
Expected value: 2
    method       mean   time [s]
       uni      2.001      0.140
       exp      2.000      0.460
       rou      2.008      0.310
 Numbers generated per algorithm: 1000000
Expected value: 4
    method       mean   time [s]
       uni      3.999      0.190
       exp      3.999      0.750
       rou      4.000      0.300
 Numbers generated per algorithm: 1000000
Expected value: 8
    method       mean   time [s]
       uni      8.000      0.290
       exp      8.000      1.340
       rou      8.001      0.300
 Numbers generated per algorithm: 1000000
Expected value: 16
    method       mean   time [s]
       uni     16.002      0.470
       exp     15.996      2.440
       rou     16.000      0.220
 Numbers generated per algorithm: 1000000
Expected value: 32
    method       mean   time [s]
       uni     32.007      0.820
       exp     32.001      4.710
       rou     31.996      0.220
 Numbers generated per algorithm: 1000000
Expected value: 64
    method       mean   time [s]
       uni     63.982      1.530
       exp     64.009      9.270
       rou     63.995      0.210

Sources

benchmark.zip : In order to compile, you need to obtain files nr3.h, ran.h, deviates.h, gamma.h from Numerical Recipes and put them into the folder with the other source files.