Python Feature Engineering Cookbook - Code Repository

Python 3.6 License

Published January 22nd, 2020

Paperback: 372 pages
Publisher: Packt Publishing
Language: English ISBN: 9781789806311

Links

Table of Contents and Recipes

  1. Foreseeing Variable problems in building ML models

    1. Identifying numerical and categorical variables
    2. Quantifying missing data
    3. Determining cardinality in categorical variables
    4. Pinpointing rare categories in categorical variables
    5. Identifying a linear relationship
    6. Identifying normal distributions
    7. Distinguishing variable distribution
    8. Highlighting Outliers
    9. Comparing feature magnitude
  2. Missing data imputation

    1. Removing observations with missing data
    2. Performing mean or median imputation
    3. Implementing mode or frequent category imputation
    4. Replacing missing values by an arbitrary number
    5. Capturing missing values in a bespoke category
    6. Replacing missing values by a value at the end of the distribution
    7. Implementing random sample imputation
    8. Adding a missing value indicator variable
    9. Performing multivariate imputation by chained equations, MICE
    10. Assembling an imputation pipeline with Scikit-learn
    11. Assembling an imputation pipeline with feature-engine
  3. Encoding Categorical Variables

    1. Creating binary variables through One Hot Encoding
    2. Performing One hot encoding of frequent categories
    3. Replacing categories by ordinal numbers
    4. Replacing categories by counts or frequency of observations
    5. Encoding with integers in an ordered manner
    6. Encoding with the mean of the target
    7. Encoding with the Weight of evidence
    8. Grouping rare or infrequent categories
    9. Performing Binary encoding
    10. Performing Feature hashing
  4. Transforming Numerical Variables

    1. Transforming variables with the logarithm
    2. Transforming variables with the reciprocal function
    3. Using square and cube root to transform variables
    4. Using power transformations on numerical variables
    5. Performing Box-Cox transformation on numerical variables
    6. Carrying out Yeo-Johnson transformation on numerical variables
  5. Performing Variable Discretisation

    1. Dividing the variable in intervals of equal width
    2. Sorting the variable values in intervals of equal frequency
    3. Performing discretization followed by categorical encoding
    4. Allocating the variable values in arbitrary intervals
    5. Performing discretization with k-means
    6. Using decision trees for discretization
  6. Working with Outliers

    1. Trimming outliers from the data set
    2. Performing Winsorization
    3. Capping the variable at arbitrary maximum and minimum values
    4. Performing zero-coding – capping the variable at zero
  7. Deriving features from Dates and time variables

    1. Extracting date and time parts from datetime variable
    2. Deriving representations of year and month
    3. Creating representations of day and week
    4. Extracting time parts from a time variable
    5. Capturing elapsed time between datetime variables
    6. Working with time in different timezones
  8. Performing Feature Scaling

    1. Standardization the features
    2. Performing Mean Normalisation
    3. Scaling to the maximum and minimum values
    4. Implementing maximum absolute scaling
    5. Scaling with the median and quantiles
    6. Scaling to vector unit length
  9. Applying Mathematical Computations to Features

    1. Combining multiple features with statistical operations
    2. Combining pairs of features with mathematical functions
    3. Performing polynomial expansion
    4. Deriving new features with decision trees
    5. Carrying out Principal Component Analysis
  10. Creating Features from Time Series and Transactional Data

    1. Aggregating transactions with mathematical operations
    2. Aggregating transactions in a time window
    3. Determining number of local maxima and minima
    4. Deriving time elapsed between time-stamped events
    5. Creating features from transactions with Featuretools
  11. Extracting features from text variables

    1. Counting characters, words and vocabulary
    2. Estimating text complexity by counting sentences
    3. Creating features with Bag of words and ngrams
    4. Implementing term frequency-inverse document frequency
    5. Cleaning and stemming text variables