- The goal of this project is to train machine learning classification models to predict default probabilities of Lending Club dataset loans issued in 2018 by training these models on pre-2018 loan data from this dataset
- Leveraging these predictions, an IRR-optimized portfolio of highest-yielding 2018 loans is constructed for a hypothetical investor looking to maximize his or her returns on this loan set
- To better present and visualize key ML results and recommendations an interactive dashboard application using a Python Dash frontend and Flask backend is created
- See Build and run app for instructions on how to build and run app
- See Sample app visualizations for sample screenshots of app
- This app uses docker-compose to run and network our
frontend
andbackend
services - If docker-compose not already installed, see installation instructions
Scripted e2e:
# Run from root dir
bash build_e2e.sh
Manually:
- See /app/backend/build_backend.md for instructions on how to manually build backend
- See /app/frontend/build_frontend.md for instructions on how to manually build frontend
- Distributions of loan grades by state:
- Loan default rates & interest rates vs FICO score:
- Retrieve live ML model default predictions on sample anonymized customer data:
- See Kaggle dataset
-
Link to blog post
-
Link to live presentation (youtube)
Please refer to /presentation/NYCDSA_Capstone_Presentation_vF.pdf for slides of presentation given on January 5th, 2021 to NYCDSA prospective students and alums regarding this project