
Machine learning project: implementation and comparison of several algorithms to classify loan applications from a historical dataset.

Primary LanguageJupyter Notebook


Business problem

Predict whether a loan case will be paid off or not given data on previous loans.

Solution strategy

  1. Build classifiers using several machine learning algorithms:
  • k-nearest neighbour;
  • Decision Tree;
  • Support Vector Machine;
  • Logistic regression.
  1. Use accuracy metrics to select best classifier.


The following table summarizes the accuracy metrics obtained for each model (with parameters chosen to maximize accurary but avoid overfitting).

Model Jaccard F1 score Log loss
KNN 0.814286 0.897638 NA
DecisionTree 0.846154 0.916667 NA
SVM 0.814286 0.730934 NA
LogisticRegression 0.814286 0.730934 0.578891


The best model is the Decision Tree. It presented the highest Jaccard and F1 scores, and it predicted less wrong payoffs. Check out its confusion matrix:



This is the Capstone Project of the course Machine Learning with Python, an online non-credit course authorized by IBM Skills Network and offered through Coursera.