Salary prediction using census bureau database
- Read dataset from "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
- Explore dataset using pandas
- Perform preprocessing (handle missing/duplicate/categorical data)
- Check feature importance through random forest classifier
- Select n features and then apply different-2 classification models (i.e : logistic regression, decision tree classifier, bagging classifier, random forest classifier)
- Select best model (using accuracy_score/ roc_auc_score) and then analyse model performance using a) confusion matrix b) Precision c) Recall d) F1-score e) ROC Curve, AUC
- Conclude using all the statistics used on the best model