Sampling from a Poisson distribution - a benchmark

Having a poisson-distributed random variable X, how can we efficiently generate samples (or say realizations) of that random variable? This article shows how Knuth’s technique can be derived and it also indicates that the Ratio-of-Uniforms method is quite efficient.

X is the number of arrivals in a poisson process within a fixed period of time. The inter-arrival times follow an exponential distribution. The following small benchmark makes an attempt to compare the time necessary to compute samples using three different algorithms:

Algorithm exp, do it yourself
Algorithm uni by D. Knuth
Algorithm rou, from Numerical Recipes

(Let’s peek at the results up-front: rou seems to win this challenge!)

exp – counting inter-arrival times

We want a random sample of the number of arrivals within a fixed period of time. We know the distribution of the inter-arrival times. By summing up inter-arrival times until the sum exceeds the fixed period of time (use the unit interval for convenience) we generate a sample number of arrivals. Formalizing this idea, we have a sequence of exponentially distributed random values:

$37376a61189a3d4861995c15ada16f893800cc4d$

The i-th interval represents the time until the i-th arrival. We then find a number k such that

$a21795030329a60e5871a38a2299aaf5d1f29c7d$

This number k is the sought-after random sample.

double exponential(Ran *ran, double lambda) {
    double p;
    do {
        p = ran->doub();
    } while (p == 0.);
    return -log(p) / lambda;
}

int poisson_exp(Ran *ran, double lambda) {
    double t = 0;
    int k = 0;
    while (1) {
        t += exponential(ran, lambda);
        if (t > 1) {
            return k;
        } else {
            k++;
        }
    }
}

uni – counting inter-arrival times, improved

Usually, the inversion method is used for sampling from an exponential distribution.

$3058f17208798b7a9cddef5597c471e0ffc084e9$

Note the logarithm. It is the computationally most expensive operation when generating a sample inter-arrival time. Knuth’s variant gets rid of this nuisance. Here’s how it works

$cb7ca50d4cdb5765b6a67cc037cf9b59265d1856$

Finding a k that satisfies the last inequation only requires λ+1 uniform samples on average, reflecting in shortened computation times.

int poisson_uni(Ran *ran, double lambda) {
    int k = -1;
    double p = 1.;
    double lambda_exp = exp(-lambda);
    do {
        k++;
        p *= ran->doub();
    } while (p > lambda_exp);
    return k;
}

rou – using Ratio-of-Uniforms method

No explanation here (unfortunately…). Just take the algorithm from Numerical Recipes and see what it’s worth. Actually, I modified it to use Ratio-of-Uniforms for all values of λ for the sake of comparison. Sources are licensed and therefore cannot be listed here.

Benchmark

Taking different values for λ (1, 2, 4, 16, 32, 64) and generating a million random samples using each of the three algorithms on a notebook from 2008, here’s the results (interpolated).

 g++ -O2 benchmark.cpp -I . -o benchmark
./benchmark

 Numbers generated per algorithm: 1000000
Expected value: 1
    method       mean   time [s]
       uni      1.000      0.130
       exp      0.998      0.300
       rou      1.036      0.310

 Numbers generated per algorithm: 1000000
Expected value: 2
    method       mean   time [s]
       uni      2.001      0.140
       exp      2.000      0.460
       rou      2.008      0.310

 Numbers generated per algorithm: 1000000
Expected value: 4
    method       mean   time [s]
       uni      3.999      0.190
       exp      3.999      0.750
       rou      4.000      0.300

 Numbers generated per algorithm: 1000000
Expected value: 8
    method       mean   time [s]
       uni      8.000      0.290
       exp      8.000      1.340
       rou      8.001      0.300

 Numbers generated per algorithm: 1000000
Expected value: 16
    method       mean   time [s]
       uni     16.002      0.470
       exp     15.996      2.440
       rou     16.000      0.220

 Numbers generated per algorithm: 1000000
Expected value: 32
    method       mean   time [s]
       uni     32.007      0.820
       exp     32.001      4.710
       rou     31.996      0.220

 Numbers generated per algorithm: 1000000
Expected value: 64
    method       mean   time [s]
       uni     63.982      1.530
       exp     64.009      9.270
       rou     63.995      0.210

Sources

benchmark.zip : In order to compile, you need to obtain files nr3.h, ran.h, deviates.h, gamma.h from Numerical Recipes and put them into the folder with the other source files.

« Enhancing Details with Unsharp Masking

Dual-booting Arch and Ubuntu with LVM on top of LUKS »

a blog by Julius Adorf

Posts in TechnologyPomodoro Timer: Prototype, Round 3 Pub combinatorics: the joy of rediscovery Quick-fix: Typing ÄÖÜ on a UK Keyboard Pomodoro Timer: Prototype, Round 2 Pomodoro Timer: Prototype with an ATmega32 Right control key on keyboard as i3 modifier in Ubuntu 20.04 A formula for converting pace from min/mile to min/km in Google Spreadsheets Visualizing Strava activities with BigQuery and Google Data Studio Thoughts on Model Thinking: a smörgåsbord Statistics tell you when to stop practicing Applying Machine Learning to Strava activities using BigQuery ML Inspecting air pollution data from OpenAQ using Colab, Pandas, and BigQuery What probability theory tells you about starting on time Analysing Strava activities using Colab, Pandas & Matplotlib (Part 4)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 3)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 2)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 1)Misleading infographics: How Not To Bubble Chart Memories from University: Teaching the Computer to play Connect Four Missing Maps: Use Your Phone for the Better How data can assist us in forming good habits Missing Maps: Putting People on the Map Energy from Thin Air: Measuring Air Pollution with CleanSpace Bletchley Park and the rebuilt bombe Motion Segmentation of RGB-D Videos via Trajectory Clustering Preview: Motion Segmentation of RGB-D Videos via Trajectory Clustering Fixing a Shimano EF50-8R bicycle shifter Programmer-friendly German keyboard layout on GNU/Linux Case study: when average speed matters Recursive circle packing with PostScript Managing encrypted devices with LVM on top of LUKS with luksctl Benchmarking Google's Speech Recognition Web Service Asus Xtion Pro Live – First Impressions Using Google's Speech Recognition Web Service with Python Speech Input in Google Chrome: x-webkit-speech Clustering Crash Simulation Data with LLCA German PC keyboard layout in Mac OS Prolonging the Life of a Logitech K340 Keyboard Computing PageRank for the Swedish Wikipedia Case Study: Role-Playing Game in C++Artificial Neural Network: Animation of Training Inspecting Algorithms with Graphs Behind the scenes: a thought abroad HP Officejet 6500 e710n-z on Arch Linux Task Manager with Focus on Usability: dropandforget Netgear WNR612 Classic Wireless Router – Good Value for Money Version Control on Top of Dropbox Public Transport in Munich now on Google Maps Quick-fix for X11: Typing Å on German Keyboard Rudimentary Recognition of Spoken Words at KTH Recognizing Textured Planar Objects with OpenCV The Viterbi Algorithm and Breadth-First Search Arch Linux: switched to systemd Rotating Backups with rsnapshot Olve Maudal and Deep C++Mappotino: A Robot for Exploration, Mapping, and Object Recognition Template Tracking using Hyperplane Approximation Fix for Wireless Presenters and Flash-based Full-screen Prezi Reinventing the Wheel: Panorama Stitching with Matlab Saving the Parrots with Homogeneous Coordinates A Connection between Motion Blur and the Fourier Transform Disabling hot-corner effect in Gnome 3 Dual-booting Arch and Ubuntu with LVM on top of LUKS Team Black Sheep presents amazing stunts with first-person-view RC plane Sampling from a Poisson distribution - a benchmarkUnderstanding someone else's source code Enhancing Details with Unsharp Masking Nearest-Neighbor-Resampling in Matlab Zweidimensionale Bereiche plotten mit Wolfram|Alpha Hosting bei Dreamhost, Domain woanders Eine weitere Identität für Binomialkoeffizienten Remote Procedure Calls über den DBus Syntaxhervorhebung mit Pygments 2D-Grafik-Ausgabe mit Cairo und OCaml Programmierkonzepte für Multi-Core-Prozessoren Funktionsgraphen zeichnen mit PostScript