Jupyter notebooks with Demos of Feature-engine's functionality
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.
In this repo, you will find a lot of examples on how to use Feature-engine's transformers on various datasets. The notebooks are sorted in the following folders and include examples for the following transformers:
creation
- MathematicalCombination
- CombineWithReferenceFeature
- CyclicalTransformer - notebook wanted, please contribute
discretisation
- EqualFrequencyDiscretiser
- EqualFrequencyDiscretiser plus WoEEncoder
- EqualWidthDiscretiser
- EqualWidthDiscretiser plus OrdinalEncoder
- DecisionTreeDiscretiser
- ArbitraryDiscreriser
- ArbitraryDiscreriser plus MeanEncoder
encoding
- OneHotEncoder
- OrdinalEncoder
- CountFrequencyEncoder
- MeanEncoder
- WoEEncoder
- PRatioEncoder
- RareLabelEncoder
- DecisionTreeEncoder
imputation
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddMissingIndicator
- CategoricalImputer
- ArbitraryNumberImputer
- DropMissingData -- notebook wanted, please contribute
outliers
- Winsorizer
- ArbitraryOutlierCapper
- OutlierTrimmer
pipelines
- create new features - wine data
- regression pipeline - house prices data
- more notebooks wanted, please constribute
transformation
- LogTransformer
- LogCpTransformer
- ReciprocalTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
wrappers
- SklearnTransformerWrapper plus Scikit-learn's OneHotEncoder
- SklearnTransformerWrapper plus Scikit-learn's feature selection classes
- SklearnTransformerWrapper plus Scikit-learn's KBinsDiscretizer
- SklearnTransformerWrapper plus Scikit-learn's Scalers
- SklearnTransformerWrapper plus Scikit-learn's SimpleImputer
selection
- notebooks wanted, please contribute
Contributing
We welcome notebooks from users of the package. If you want to create one of the missing notebooks, or want to add a notebook of your own, provided that the data set is free to share, make a pull request with the code.
How to contribute:
Local Setup Steps
- Fork the repo
- Clone your fork into your local computer:
git clone https://github.com/<YOURUSERNAME>/feature-engine-examples.git
- cd into the repo
cd feature-engine-examples
- If you haven't done so yet, install feature-engine
pip install feature_engine
- Create a feature branch with a meaningful name for your notebook:
git checkout -b mynotebookbranch
- Develop your notebook
- Add the changes to your copy of the fork:
git add .
,git commit -m "a meaningful commit message"
,git pull origin mynotebookbranch:mynotebookbranch
- Go to your fork on Github and make a PR to this repo
- Done
Thank you!!
Feature-engine features in the following resources:
Blogs about Feature-engine:
-
Feature-engine: A new open-source Python package for feature engineering
-
Practical Code Implementations of Feature Engineering for Machine Learning with Python