Traning-a-supervised-Ml-Model

Clean your data
  • Check for any Fromating Errors (ex:- date in one row can be 5/7/2001 and in the next can be 10st may 2010)

  • Strings in numeric Field

  • Outliers The following row has an extreme (and unbelievable) value for number_of_bedrooms:

  • Missing Values The following row has a missing price:

  • Misspellings The following row has a misspelling in the type column:

  • Duplicates

  • Nulls and Nan

Create New Features From Existing Features
  • Binning

    • Numeric Binning
    • Categorical Binning
  • Splitting

    • Date/Time Decomposition
    • Compound String Splitting
  • One-Hot Encoding

    • sometimes This approach introduces a problem For example if we assign some numeric values to regions like Aisa as 1 Europe as 2 somehow machine model will understand that europe is greater than asia This will be a problem