Provide helpers in python (pandas, numpy, scipy) to perform automated cleaning and pre-processing steps.
- Introduction
- Installation
- Version
- Usage
- Thanks
This package provide different classes for data cleaning
In this module you have :
-
An
DataExploration
class with methods to count missing values, detect constant col theDataExploration.structure
provide a nice exploration summary. -
An
OutliersDetection
class which is a simple class designed to detect 1d outliers -
An
NaImputer
class which is a simple class designed to focus on missing values exploration and comprehension (Rubbin theory MCAR/MAR/MNAR) -
A examples folder to find notebooks illustrating the package
-
A
test.py
file containing tests (you can run the test with$python -m unittest -v test
)
Installation via pip is not available now (coming soon)
-
Clone the project on your local computer.
-
Run the following command
$ python setup.py install
The current version is 0.1 (early release version). The module will be improved over time.
To complete
Anybody is welcome to do pull-request and check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to auto-clean, create a new issue so we can discuss it.
Thanks to all the creator and contributors of the package we are using.