ImputeBench implements 13 recovery techniques for blocks of missing values in time series and evaluates their precision and runtime on various real-world time series datasets using different recovery scenarios. Technical details can be found in our PVLDB 2020 paper: Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series . The benchmark allows to easily integrate new algorithms and datasets.
-
The benchmark implements the following algorithms: CDRec, DynaMMo, GROUSE, ROSL, SoftImpute, SPIRIT, SSA, STMVL, SVDImpute, SVT, TeNMF, TRMF, and TKCM.
-
All the datasets used in this benchmark can be found in the folder
Datasets
. -
The full list of recovery scenarios can be found here.
Prerequisites | Build | Execution | Algorithm and Dataset Insertion | Citation | Award
- Ubuntu 16 or Ubuntu 18 (including Ubuntu derivatives, e.g., Xubuntu) or the same distribution under WSL.
- Clone this repository.
- Mono: Install mono from https://www.mono-project.com/download/stable/ (takes few minutes).
- Build all the Testing Framework using the installation script located in the root folder (takes ~1min):
$ sh install_linux.sh
$ cd TestingFramework/bin/Debug/
$ mono TestingFramework.exe [arguments]
-alg | -d | -scen |
---|---|---|
cdrec | airq | miss_perc |
dynammo | bafu | ts_length |
grouse | chlorine | ts_nbr |
rosl | climate | miss_disj |
softimp | drift10 | miss_over |
svdimp | electricity | mcar |
svt | meteo | blackout |
stmvl | temp | all |
spirit | bafu_red | |
tenmf | drift10_red | |
tkcm | all | |
trmf | ||
all |
All results will be added to Results
folder. The accuracy results and plots of all algorithms will be sequentially added for each scenario and dataset to: Results/.../.../error/
. The runtime results and plots of all algorithms will be added to: Results/.../.../runtime/
.
- Run the whole benchmark (all algorithms, all datasets, all scenarios, precision and runtime)
$ mono TestingFramework.exe -alg all -d all -scen all
Warning: Running the whole benchmark will take a sizeable amount of time (up to 4 days depending on the hardware) and will produce up to 15GB of output files with all recovered data and plots unless stopped early.
- Run a single algorithm (cdrec) on a single dataset (drift10) using one scenario (missing percentage)
$ mono TestingFramework.exe -alg cdrec -d drift10 -scen miss_perc
- Run two algorithms (spirit, cdrec) on a single dataset (drift10) using one scenario (missing percentage)
$ mono TestingFramework.exe -alg spirit,cdrec -d drift10 -scen miss_perc
- Run point 3 without runtime results
$ mono TestingFramework.exe -alg spirit,cdrec -d drift10 -scen miss_perc -nort
- Additional command-line parameters
$ mono TestingFramework.exe --help
Remark: Algorithms tkcm
, spirit
and ssa
cannot handle multiple incomplete time series. These two allgorithms will not produce results for scenarios: miss_disj
, miss_over
, mcar
and blackout
.
- You can parametrize each algorithm using the command
-algx
. For example, you can run the svdimp algorithm with a reduction value of 4 on the drift dataset and by varying the sequence length as follows:
$ mono TestingFramework.exe -algx svdimp 4 -d drift10 -scen ts_nbr
- If you want to run some algorithms with default parameters, and some with customized ones, you can use
-alg
and-algx
together. For example, you can run stmvl algorithm with default parameter and cdrec algorithm with a reduction value of 4 on the airq dataset and by varying the sequence length as follows:
$ mono TestingFramework.exe -alg stmvl -algx cdrec 4 -d airq -scen ts_nbr
Remark: The command -algx
cannot be executed in group and thus must preceed the name of each algorithm.
- To add your own algorithm to the benchmark, please refer to this tutorial.
- To add your own dataset:
- import the file to
TestingFramework/bin/Debug/data/{name}/{name}_normal.txt
(name
is the name of your data). - Requirements: rows>= 1'000, columns>= 10, column separator: empty space, row separator: newline
- import the file to
@inproceedings{imputebench2020vldb,
author = {Mourad Khayati and Alberto Lerner and Zakhar Tymchenko and Philippe Cudr{\'{e}}{-}Mauroux},
title = {Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series},
booktitle = {Proceedings of the VLDB Endowment},
volume = {13},
number = {5},
year = {2020}
}
Imputebench has received the VLDB 2020 Most Reproducible Paper Award.