Encode-Categorical-Features

Handling Categorical/Qualitative variables is an important step in data preprocessing.Many Machine learning algorithms can not understand categorical variables by themself unless we convert them to numerical values. The performance of ML algorithms is based on how Categorical variables are encoded. The results produced by the model varies from different encoding techniques used.

Categorical variables can be divided into two categories:

Nominal (No order)
Ordinal (some order).

There are many ways we can encode these categorical variables

One Hot Encoding
Label Encoding
Ordinal Encoding
Frequency or Count Encoding
Binary Encoding
Base-N Encoding
Helmert Encoding
Mean Encoding or Target Encoding
Weight of Evidence Encoding
Sum Encoder (Deviation Encoding or Effect Encoding)
Leave One Out Encoding
CatBoost Encoding
James-Stein Encoding
M-estimator Encoding
Hashing Encoding
Backward Difference Encoding
Polynomial Encoding
MultiLabelBinarizer

Following libraries are used to perform encoding.

!pip install scikit-learn
!pip install category-encoders

kevindany/Encode-Categorical-Features

Encode-Categorical-Features

There are many ways we can encode these categorical variables

Following libraries are used to perform encoding.

Below cheat-sheet is a guiding tool to select enconding method.

References