This is a project in two parts:
- The first survey and taxonomy for existing online controlled experiment datasets, and
- The ASOS Digital Experiments dataset - the first public dataset that supports the design and running of experiments with adaptive stopping.
The work is accepted into NeurIPS 2021 Track on Datasets and Benchmarks. (Link to NeurIPS proceedings | OpenReview | arXiv)
If you find the project helpful, please use the following citation:
@inproceedings{liu2021datasets,
author = {Liu, C. H. Bryan and Cardoso, \^{A}ngelo and Couturier, Paul and McCoy, Emma J.},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
editor = {J. Vanschoren and S. Yeung},
pages = {},
title = {Datasets for Online Controlled Experiments},
url = {https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/274ad4786c3abca69fa097b85867d9a4-Paper-round2.pdf},
volume = {1},
year = {2021}
}
A summary of the survey, together with the direct links to the datasets are available on this Open Data StackExchange answer.
The dataset is available on: https://osf.io/64jsb/ .
The experiment notebook uses the parquet form of the dataset. It would attempt to download the file before getting pandas to load the dataframe. If that doesn't work, you can either:
To get the parquet form of the dataset used in the experiments, you can do one of:
- Download the file via this direct link and place it in the
data
directory, or - Use the following command at the root of this repo:
wget -O ./data/asos_digital_experiments_dataset.parquet https://osf.io/62t7f/download
This file assumes you have access to a *nix-like machine (both MacOS or Linux would do). If you have a Windows machine, the notebook should still work provided you have the right Python packages installed, but it is not tested.
This project uses pyenv
and poetry
for package management.
Before you start, please ensure you have gcc
, make
, and pip
installed.
For Linux (together with other required libraries):
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
xz-utils tk-dev libffi-dev liblzma-dev python-openssl git
wget -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
chmod u+x pyenv-installer
./pyenv-installer
For OS X:
brew install pyenv
brew install pyenv-virtualenv
We then need to configure the PATHs:
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
...and install the right Python version for our environment:
pyenv install 3.9.10
See https://python-poetry.org/docs/#installation for the installation instructions.
git clone https://github.com/liuchbryan/oce-dataset.git
cd oce-dataset
# Switch to Python 3.9.10 for pyenv
pyenv local 3.9.10
poetry env use ~/.pyenv/versions/3.9.10/bin/python
poetry install
poetry shell
Within the newly spawn up virtualenv shell, run
jupyter notebook
Once you are done, terminate the Jupyter server using Ctrl+C, and type exit
to exit the virtualenv shell.