Predict whether a loan case will be paid off or not given data on previous loans.
- Build classifiers using several machine learning algorithms:
- k-nearest neighbour;
- Decision Tree;
- Support Vector Machine;
- Logistic regression.
- Use accuracy metrics to select best classifier.
The following table summarizes the accuracy metrics obtained for each model (with parameters chosen to maximize accurary but avoid overfitting).
Model | Jaccard | F1 score | Log loss |
---|---|---|---|
KNN | 0.814286 | 0.897638 | NA |
DecisionTree | 0.846154 | 0.916667 | NA |
SVM | 0.814286 | 0.730934 | NA |
LogisticRegression | 0.814286 | 0.730934 | 0.578891 |
The best model is the Decision Tree. It presented the highest Jaccard and F1 scores, and it predicted less wrong payoffs. Check out its confusion matrix:
This is the Capstone Project of the course Machine Learning with Python, an online non-credit course authorized by IBM Skills Network and offered through Coursera.