Data-Preprocessing

Data Preprocessing part 1: https://github.com/musama619/DataPreProcessing

  • Handling Missing Data

  • Encoding Independent Variables

    OneHotEncoder
  • Encoding Dependent Variable

    LabelEncoder
  • Train - Test Split

  • Feature Scaling

    Standardisation

Note:

1. Standardisation : values between -3 and 3 (works all the time)

2. Normalisation : values between 0 and 1 (recommended only when features follows normal distribution)

scaler

  • remember to apply feature scaling after train test split
  • feature scaling does not need to be applied on dummy variables as they already have values between -3 and 3

Important link: