A credit scoring model is a tool that is typically used in the decision-making process of accepting or rejecting a loan. A credit scoring model is the result of a statistical model which, based on information about the borrower (e.g. age, number of previous loans, etc.), allows one to distinguish between "good" and "bad" loans and give an estimate of the probability of default.
Loan Defalut: In finance, default is failure to meet the legal obligations of a loan, for example when a home buyer fails to make a mortgage payment, or when a corporation or government fails to pay a bond which has reached maturity.
- Part 1 - Data Processing: Cleaning and Transforming Raw Data into the Understandable Format
- Part 2 - Profiling: Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data.
- Part 3 - Logistic Regression Model: In statistics, logistic regression, or logit regression, is a regression model where the dependent variable is categorical. This article covers the case of a binary dependent variable—that is, where the output can take only two values, "0" and "1", which represent outcomes such as pass/fail, win/lose etc.
- Part 4 - Decision Tree Model
- Part 5 - Clustering Analysis
- Part 6 - Principal Components Analysis
The raw dataset is in the file "CreditScoring.csv" which contains 4455 rows and 14 columns:
1 Status | credit status |
2 Seniority | job seniority (years) |
3 Home | type of home ownership |
4 Time | time of requested loan |
5 Age | client's age |
6 Marital | marital status |
7 Records | existance of records |
8 Job | type of job |
9 Expenses | amount of expenses |
10 Income | amount of income |
11 Assets | amount of assets |
12 Debt | amount of debt |
13 Amount | amount requested of loan |
14 Price | price of good |