/Kaggle-Home-Credit-Default-Risk

A typical way of detect fraud in credit card transactions; See read me for more details;

Kaggle-Home-Credit-Default-Risk

There is a Kaggle competition (https://www.kaggle.com/c/home-credit-default-risk).

Home Credit & Its Business

Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders.

Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities.

While Home Credit is currently using various statistical and machine learning methods to make these predictions, they're challenging Kagglers to help them unlock the full potential of their data. Doing so will ensure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful.

Why it is interesting?

Home Credit is trying to fill in the market of financially underserved people. Banks usually don't do business with such groups since:

    1. Less profit per deal
    1. High risk of bad loans
    1. Many documents & Need days to get approval

So this is a very useful project. As you can basically start a similar business by leveraging the idea behind this and take care of the risk. (Of course, different populations, different regions will make things different but at least there is some workaround. For example, we can extract the most important variables and start a similar business with more tough thresholds on these variables).

In short, it helps solve the new-starter problem. BTW, many companies start to do small loans to those financially underserved people.

My goal of this project

Considering I will be working standalone, I will start simple. Not too simple that I delibrately ignore some dataset or information.

What I will do:

  1. Leverage all the information, write production level code, create a base model and leave space for further improvement

  2. Do not optimize details in the first place

  3. Never give up before finishing the base model

Continue ...