Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn like functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data. Current Feature-engine's transformers include functionality for:
- Missing data imputation
- Categorical variable encoding
- Outlier removal
- Discretisation
- Numerical Variable Transformation
- Documentation: http://feature-engine.readthedocs.io
- Home page: https://www.trainindata.com/feature-engine
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddNaNBinaryImputer
- CategoricalVariableImputer
- FrequentCategoryImputer
- ArbitraryNumberImputer
- CountFrequencyCategoricalEncoder
- OrdinalCategoricalEncoder
- MeanCategoricalEncoder
- WoERatioCategoricalEncoder
- OneHotCategoricalEncoder
- RareLabelCategoricalEncoder
- Winsorizer
- ArbitraryOutlierCapper
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- DecisionTreeDiscretiser
- LogTransformer
- ReciprocalTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
pip install feature_engine
or
git clone https://github.com/solegalli/feature_engine.git
from feature_engine.categorical_encoders import RareLabelEncoder
rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)
See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io
BSD 3-Clause
- Soledad Galli - Initial work - Feature Engineering Online Course.
Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.
To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course
For a summary of the methods check this presentation and this article
To stay alert of latest releases, sign up at trainindata