peeush-agarwal/amlp_book_code

Approaching(Almost) any ML problem books code for practice

Jupyter Notebook

Approaching (Almost) Any ML problem book's code for practice

Practice code from above book.

Supervised and unsupervised learning
Cross validation
Evaluation metrics
Project structure for any ML project
Approaching categorical variables
1. OneHot encoding + Logistic Regression model
  This gives us AUC score of ~0.78 which is good. As the AUC score is in range of 0-1 and 1 being the perfect model.
2. LabelEncoding
  1. Random Forest model
    - This gives us AUC score of ~0.71 which is worse than Logistic regression model.
    - This model also takes more time and space compared to Logistic regression model.
    - This implies that we should never ignore basic model when training for the problem.
  2. XGBoost model
    - This gives us AUC score of ~0.76 which is better than RandomForest model, but still not better than Logistic regression model.
    - This model also takes more time and space compared to Logistic regression and RandomForest models.