- This Analysis report shows Online Loan study, where multiple derivations based on Return of Investment,Loan Defaulted, Number of loans issued based on employment, state, loan_puprose and other metrics have been depicted.
- Data shape - 39717 rows and 111 columns
Python - 3.0 Packages : pandas, numpy, matplotlib, seaborn.
After reading the data, following changes were done
- Fixing Rows and Columns: DataType changes and column values fixing
- Convert string object to date object for below columns: issue_d, last_pymt_date,int_rate
- Removal of extra space in column value : term
Dropping unnecessary columns and columns with more 70% of missing values and filling missing values.
- Let's drop the columns with more than 30% missing values(since the data is already huge).
- Since there is no much spread of data and the difference between mean and median is very small, let's impute the missing values with mean for column: revol_util.
- From above plots, it shows that more number of loans were from B,A and C grade's and least from G grade.
- From Sub grades A4, B3 have more number of loans.
- From 3rd plot, it shows that A,B,C grade loans have less interest rate and E,F,G have high interest rate. From 1st, 2nd plots there are more number of loans from A,B,C grade(granularity check from sub-grades). It might be the reason that the loan applicant's from A,B,C grades have better credit score and lower risk.
- From 4th plot, it shows that there are high funded amount in A,B,C and D grades as the applicant's from these grades have better credit score and lower risk.
We see that the majority of borrowers have been employed for at least 10 years.
It shows there are more defaulters in RENT and MORTGAGE.
There are more defaulters from 'debt_consolidation','other', 'credit_card' and 'small_business'
- Number of loans issued increased steadily by every year with a slight decrease in 2008.
- Of settled loans, 83% were Fully Paid and 14% were Charged Off.
- Borrowers with own house and the purpose of loan with consolidate debt, 'credit_card' and 'small_business' are not at much risk, but borrower with rent,mortgage are high risk applicants.
- Majority of loans were from A, B, and C grade.
- There is an inverse relationship between interest rate and loan grade - lower grades(E,F,G) have higher interest rate.
- Overall, there are more defaulters from 'debt_consolidation', 'others', 'credit_card' and 'small_business' purpose loans from all grades.
- This repo will give you everything required to understand Data Exploration as a beginner.
- From Data Cleaning, Data preprocessing, Data Visualisation you will have an excellent overview with multiple cases.
- Working on Categorical datatype, Numerical datatype, this will be your goto guide for EDA.