Project Overview

  • The goal of this project is to train machine learning classification models to predict default probabilities of Lending Club dataset loans issued in 2018 by training these models on pre-2018 loan data from this dataset
  • Leveraging these predictions, an IRR-optimized portfolio of highest-yielding 2018 loans is constructed for a hypothetical investor looking to maximize his or her returns on this loan set
  • To better present and visualize key ML results and recommendations an interactive dashboard application using a Python Dash frontend and Flask backend is created
    • See Build and run app for instructions on how to build and run app
    • See Sample app visualizations for sample screenshots of app

Build and run app

  • This app uses docker-compose to run and network our frontend and backend services
  • If docker-compose not already installed, see installation instructions

Scripted e2e:

# Run from root dir
bash build_e2e.sh

Manually:

  • See /app/backend/build_backend.md for instructions on how to manually build backend
  • See /app/frontend/build_frontend.md for instructions on how to manually build frontend

Sample app visualizations

  1. Distributions of loan grades by state:
 
  1. Loan default rates & interest rates vs FICO score:
 
  1. Retrieve live ML model default predictions on sample anonymized customer data:

Dataset

Blog post + live presentation

Presentation slides

Please refer to /presentation/NYCDSA_Capstone_Presentation_vF.pdf for slides of presentation given on January 5th, 2021 to NYCDSA prospective students and alums regarding this project