The purpose of this analysis is to build models using different machine learning methods and determining which one is best.
- The Naive Random Oversampling method produced a balanced accuracy score of 0.6522, a high risk precision score of 0.01, a high risk recall score of 0.68, a low risk precision score of 1.00, and a low risk recall score of 0.62.
- The SMOTE method produced a balanced accuracy score of 0.6543, a high risk precision score of 0.01, a high risk recall score of 0.61, a low risk precision score of 1.00, and a low risk recall score of 0.69.
- The undersampling method produced a balanced accuracy score of 0.5402, a high risk precision score of 0.01, a high risk recall score of 0.68, a low risk precision score of 1.00, and a low risk recall score of 0.40.
- The combination over/under sampling method produced a balanced accuracy score of 0.6607, a high risk precision score of 0.01, a high risk recall score of 0.77, a low risk precision score of 1.00, and a low risk recall score of 0.55.
In summary, the over/under sampling method was the best performing method of the ones tried. It had the highest balanced accuracy score, and it had the best high risk recall score, which is important when identifying credit risk.