Driven_Data_Challenge

This repo contains solution for challenge hosted by driven data for predicting poverty

---- Information from driven data ----

The data for this competition comes from The World Bank Development Data Group.

With funding from the World Bank's Knowledge for Change Program, this competition aims to engage data scientists from developing countries and apply a cost-effective solution to testing a diverse set of approaches to poverty prediction.

The surveys used come from three developing countries. Each country offers a different demographic makeup, so successful poverty prediction across these countries will help identify robust set of predictors that can be used in future poverty measurement efforts.

--- Data Preprocessing ---

  • Standradization
  • Encoding
  • Column Enforcement in train and test set
  • Replacing missing values

--- Sampling ----

  • SMOTE

---- Dimensioanlity Reduction ---

  • SVD
  • PCA

----- Algorithm Experimented ----

  • Random Forest
  • Logistic Regression
  • AdaBoost Classifier
  • Gradient Boosting
  • Decision Tree
  • Gaussian NB
  • Extra Tree Classifier
  • XGBoost

---- Winning Algorithm ----

  • XGBoost without any sampling or dimensionality reduction