We have two goals for this project. First, we would like to predict the probability that an individual defaults on a loan. The second is to determine the features that are most important in determining if an individual will default on a loan. We think that it will be important to determine important features along with probabilities because individuals might want to know the areas which are holding them back from having a lower probability of defaulting on a loan.
- Exploratory data analysis and feature engineering on data provided
- Balanced training data using SMOTE
- Modeling using decision tree classifier, random forest classifier, logistic regression, and ridge classifier
- Creation of dashboard to allow for user functionality
Important features for predicting a default on a loan:
- Income
- Age
- Experience
- Years at current job
- GDP (state)
- Years in current house
- Integrate additional features to improve predictive power of model
- Include loan amounts as this will likely influence default rates
- Improve dashboard functionality for users
This project uses consumer data belonging to a Hackathon organized by "Univ.AI". This project is targeted to benefit loan givers, borrowers, and government institutions. We believe that it will benefit those distributing the loans because they will be able to determine the likelihood of an individual defautling on a loan and will be able to distirbute loans more efficiently. Borrowers will be able to benefit from this analysis as well becuase they will be able to determine areas in which they need to improve in order for their likelihood to default to decrease. Government institutions will also benefit from this because they will be able to determine where to provide aid.
All values were provided at the time of the loan application. It contains 12 columns, including income, age, experience, profession, marriage, house ownership, car ownership, risk flag, job years, house tears, city and state.
- Dataset: Loan Prediction Based on Customer Behavior
- Topic Selection: Topic Selection Google Docs
- Team Charter: Team Charter Google Docs
- Project Plan: Project Plan Google Docs
- Application: Streamlit Loan Default Web App
- Presentation: Slideshow
Matthew Flaherty, M.S. in Data Science Candidate
Vanderbilt University
matthew.i.flaherty@Vanderbilt.Edu
Jiayi Fu, M.S. in Data Science Candidate
Vanderbilt University
jiayi.fu@Vanderbilt.Edu
Logan King, M.S. in Data Science Candidate
Vanderbilt University
logan.a.king@Vanderbilt.Edu
Wenqi Lyu, M.S. in Data Science Candidate
Vanderbilt University
lyu.wenqi@Vanderbilt.Edu
Yuechen Yang, M.S. in Data Science Candidate
Vanderbilt University
yuechen.yang@Vanderbilt.Edu