/feature-engineering-for-machine-learning

Code Repository for the online course Feature Engineering for Machine Learning

Primary LanguageJupyter NotebookOtherNOASSERTION

Python 3.6 Python 3.7 Python 3.8 License

Feature Engineering for Machine Learning - Code Repository

Code repository for the Online Course Feature Engineering for Machine Learning

Published November, 2017 Last Updated December, 2020

Table of Contents

  1. Introduction: Variable Types

    1. Numerical Variables: Discrete and continuous
    2. Categorical Variables: Nominal and Ordinal
    3. Datetime variables
    4. Mixed variables: strings and numbers
  2. Variable Characteristics

    1. Missing Data
    2. Cardinality
    3. Category Frequency
    4. Distributions
    5. Outliers
    6. Magnitude
  3. Missing Data Imputation

    1. Mean and Median Imputation
    2. Arbitrary value imputation
    3. End of Tail Imputation
    4. Frequent category imputation
    5. Adding string missing
    6. Random Sample Imputation
    7. Adding a missing indicator
    8. Imputation with Scikit-learn
    9. Imputation with Feature-engine
  4. Multivariate Imputation

    1. MICE
  5. Categorical Variable Encoding

    1. One hot encoding: simple and of frequent categories
    2. Ordinal encoding: arbitrary and ordered
    3. Target mean encoding
    4. Weight of evidence
    5. Probability Ratio
    6. Rare Label encoding
    7. Encoding with Scikit-learn
    8. Encoding with Feature-engine
    9. Encoding with category encoders
  6. Variable Transformation

    1. Log, power and reciprocal
    2. Box-Cox
    3. yeo-Johnson
    4. Transformation with Scikit-learn
    5. Transformation with Feature-engine
  7. Discretisation

    1. Arbitrary
    2. Equal-frequency discretisation
    3. Equal-width discretisation
    4. K-means discretisation
    5. Discretisation with trees
    6. Discretisation with Scikit-learn
    7. Discretisation with Feature-engine
  8. Outliers

    1. Capping
    2. Trimming
  9. Feature Scaling

    1. Standardisation
    2. MinMaxScaling
    3. MaxAbsoluteScaling
    4. RobustScaling
  10. Mixed variables

    1. Creating new variables from strings and numbers
  11. Datetime

    1. Extracting day, month, week, etc
    2. Extracting hr, min, sec, etc
    3. Capturing elapsed time
    4. Working with timezones
  12. Pipelines

    1. Classification Pipeline
    2. Regression Pipeline
    3. Pipeline with cross-validation

Links