/pyampute

Missing data amputation and exploration functions for Python

Primary LanguagePythonOtherNOASSERTION

pyampute

made-with-python code-coverage

A Python library for generating missing values in complete datasets (i.e. amputation) and exploration of incomplete datasets.

Check out the documentation and find examples!

Features

Amputation is the opposite of imputation: the generation of missing values in complete datasets. This is useful for evaluating the effect of missing values in your model, mostly in experimental settings, but also as a preprocessing step in developing models.

Our MultivariateAmputation class is compatible with the scikit-learn-style fit and transform paradigm and can be used in a scikit-learn Pipeline.

The underlying methodology has been proposed by Schouten, Lugtig and Vink (2018) and has been implemented in an R-function as well: mice::ampute. Compared to ampute, pyampute's parameters are easier to specify and allow for more variation. See this blogpost to learn more.

import numpy as np
from pyampute.ampute import MultivariateAmputation
n = 1000
m = 10
rng = np.random.default_rng()
X_compl = rng.standard_normal((m, n))
ma = MultivariateAmputation()
X_incompl = ma.fit_transform(X_compl)

Among others, we also provide an mdPatterns class, which displays missing data patterns in incomplete datasets.

from pyampute.exploration.md_patterns import mdPatterns
mdp = mdPatterns()
patterns = mdp.get_patterns(X_incompl)

Installation

Python Package Index (PyPI)

pip install pyampute

From source

git clone https://github.com/RianneSchouten/pyampute.git
pip install ./pyampute

License

BSD 3-Clause License

Citation

@misc{schouten_rianne_m_2022_6946887,
author       = {Schouten, Rianne M and
               Zamanzadeh, Davina and
               Singh, Prabhant},
title        = {pyampute: a Python library for data amputation},
month        = aug,
year         = 2022,
publisher    = {Zenodo},
doi          = {10.25080/majora-212e5952-03e},
url          = {https://doi.org/10.25080/majora-212e5952-03e}
}

@article{Schouten2018,
title={Generating missing values for simulation purposes: {A} multivariate amputation procedure},
author={Schouten, Rianne M. and Lugtig, Peter and Vink, Gerko},
journal={Journal of Statistical Computation and Simulation},
volume={88},
number={15},
pages={2909--2930},
year={2018}
}

Watch our SciPy'22 presentation here.

Contact details

For questions, comments and if you would like to contribute, please do not hesitate to contact us. You can find our contact details here.

Cheers,