A command line binary written in C for performing fast Monte Carlo Permutation Tests for validating A/B test results.
Uses Double precision SIMD-oriented Fast Mersenne Twister (dSFMT) for wicked fast pseudo-randomization.
Our clients at Monetate run thousands of A/B tests each year. We've recently updated our statistical models to calculate significance of these campaigns' goals (Conversion Rate, Revenue Per Visit, etc) and wanted a way to:
- Validate the accuracy of these statistical models
- Create a feedback loop to further improve the models
We chose Monte Carlo to accomplish this, but quickly found that running simulations in our language of choice, Python, on A/B Tests with several million visitors wasn't going to cut it.
Jump right to Installation, CLI Usage or straight to the code if you're already familiar with Monte Carlo Testing.
Say you're running an A/B Test on a site to see if the experiment variant had a significant effect on Revenue Per Visit. To simplify things a bit, let's begin by looking at just three of these visitors. Our first visitor, John, visited twice but did not buy anything. Suzy visited once and made one $9 purchase. Bob visited twice and made two purchases at $8 and $9.
In this case, a visitor may have multiple visits but the A/B Test randomizes based on visitor to give everyone a consistent experience.
User | Group | Visits (y0) | Purchase Amount Sum (y1) | Purchase Amount Sum of Squares (y2) * |
---|---|---|---|---|
john | Experiment | 2 | 0 | 0 |
suzy | Control | 1 | 9 | 81 |
bob | Experiment | 2 | 17 | 145 |
* We include y2 here for calculating variance in our models
From storing the info this way, we can compute our observed difference with statistical significance in revenue per visit between the two groups.
To verify the computed significance, we can also send this data through a Monte Carlo Simulator to determine how likely the difference was due to randomness or not.
The simulator performs multiple permutations. On each interation the simulator will randomly assign visitors to a theoretical Experiment or Control group and sum up the y0, y1, y2 for all visitors in the group.
We can see that in the table below, two simulations were performed. The first simulation put all three visitors in the Experiment group and none in the Control group. In the second simulation, it put John and Suzy in the Expermiment group and Bob in the Control Group.
Simulation | Group | Visits (y0) | Purchase Amount Sum (y1) | Purchase Amount Sum of Squares (y2) |
---|---|---|---|---|
0 | Experiment | 5 | 26 | 226 |
0 | Control | 0 | 0 | 0 |
1 | Experiment | 3 | 9 | 81 |
1 | Control | 2 | 17 | 145 |
Let's now assume we had 2 million visitors split evenly into Experiment and Control groups. We observed a difference in revenue per visit of $1.50 with a p-value of 11% in our two-tailed t-test.
Now we run the Monte Carlo simulator with 10,000 iterations. We can then calculate the difference between the two groups for each of the ten thousand simulations. Most of these differences will be near zero because we randomly distributed the visitors between the two groups, but some may lay outside of our $1.50 observed difference.
Simulation | Group | Visits (y0) | Purchase Amount Sum (y1) | Purchase Amount Sum of Squares (y2) |
---|---|---|---|---|
0 | Experiment | 1000129 | 124124 | 9193930 |
0 | Control | 999871 | 111123 | 10003234 |
1 | Experiment | 999976 | 154320 | 8100857 |
1 | Control | 1000024 | 82394 | 7231043 |
... | ... | ... | ... | ... |
9999 | Experiment | 993429 | 100001 | 9534543 |
9999 | Control | 1006571 | 129993 | 8738439 |
If we see that 1000 of the 10,000 random iterations had a difference of more than $1.50, we can say that there is a 10% chance that our $1.50 observed difference was due to randomness.
Although technically not a direct comparison, we can compare our computed p-value of 11% to our simulated 10% result to determine whether or not the model is accurate enough.
You can grab a pre-compiled binary for your OS and architecture from a Github Release:
wget https://github.com/monetate/monte-carlo-simulator/releases/download/v0.1.0/monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz \
-O monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz
tar -zxvf monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz
cd monte-carlo-simulator-v0.1.0-Linux-i386
Currently, the simulator will only build on a machine with a CPU with Intel's SSE2 instructions and a C compiler which supports these features.
It's known to work on:
- Amazon's EC2 instances with gcc 4.1.2
- Travis CI Bluebox workers with gcc 4.6 and clang 3.3
The default make
target will build a binary named simulate
in the project's root directory.
CC=gcc make
The simulate
binary takes a csv on stdin
and outputs a resulting csv on stdout
.
cat /path/to/samples.csv | (./simulate 10000 0.5 0.5) > /path/to/results.csv
It accepts the number of simulations to run as the first positional argument. The following arguments describe the weighting for each of your groups.
You can pass weights as percentages or whole numbers. The following three variants are all equivalent:
./simulate 10000 2 3 5
./simulate 10000 0.2 0.3 0.5
./simulate 10000 400 600 1000
The input csv is assumed to have exactly four columns with no header row.
id
: Unique idy0
: The number of samplesy1
: The sum of the samplesy2
: The sum of squares of the samples
john,2,0.0,0.0
suzy,1,9.0,81.0
bob,2,17,145.0
The result csv will contain 5 columns with no header row.
simulation
: Simulation indexgroup_id
: Group idy0
: The number of samples in the groupy1
: The sum of the samples in the groupy2
: The sum of squares in the group
0,0,5.0,26.0,226.0
0,1,0.0,0.0,0.0
1,0,3.0,9.0,81.0
1,1,2.0,17.0,145.0
...
- Jeffrey Persch
- Chris Conley
- Gil Raphaelli
- Austin Rochford
CC=gcc make test
This project is released under the MIT License.