/Prosper_Loans_data_exploration

This document explores the prosper Loans dataset. The data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. The data set was last updated on 03/11/2014 In this this project we will explore the characteristics of variables that can affect the loan status and to get some ideas about the relationships among multiple variables using summary statistics and data visualizations.

Primary LanguageHTML

Binder

Prosper Loan Data Exploration

by Anthony Odiba

Dataset

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. The dataset can be found here, This data dictionary explains the variables in the data set.

Preview


Note:

The files are too large for github to render so please note important instructions below

  • 1. Click this link if you didn't notice binder badge above
  • 2. Alternatively if you like doing things the harder way use html preview to view them.
  • To do 2 above, simply copy the link to the file you want to view, paste in the url bar provided on the page
  • 3. My final .ipynb was too large(~64MB), obviously too large to upload to github so i used Git Large File Storage. The instructions on how to do this can be seen on the site.
  • 4. I think there are some ways to reduce the file size by dropping some columns of the dataset used. I haven't explored this.
  • 5. Because of the large plotly plots, the cell where the plots are run keep timing out. This helped, as well as this.
  • 6. I cloned this repo to my jupyterLab and these three articles helped in keeping things sane. Click here, here next and lastly here.

Lastly, this git article helped me learn how to move files and folders around.


Visualisation Libraries Used

Apart from the usual suspects, i.e seaborn, matplot, etc, I played around with folium, plotly and plotly express to come up with some beautiful and useful visualisations for this project.

Summary of Findings

Key Insights for Presentation

For the presentation, I focus on viriables that had an effect on the ProsperScore or CreditGrade. I start by introducing distribution of prosper loans around the U.S.A then we see how much various groups access Prosper loans. We delve deeper by observing how DebtToIncomeRatio vs BorrowerAPR interact with respect to CreditGrade, we then observe how StatedMonthlyIncome interacts with LoanAmount with respect to ProsperScore. We conclude with the strongest observed relationship; BorrowerAPR Vs ProsperScore.

An interesting addition is a Parallel plot showing interaction of 9 different variables.

Some More Parallel Plot