CSYE7200 Credit Scoring Analysis

Group Member

  • Ting-kai Liu, NUID: 001306707
  • Xuyang Li, NUID: 001409590
  • Xing Dong, NUID: 001718652

Summary:

The project focused on the basic prediction of the probability that someone will experience financial distress in the next two years by using two models. By evaluating and comparing their performance, we will finally present our decision.

Models

Both Random Forest Model and Logistic Regression are using for predicting the probability that someone will experience financial distress in the next two years

  • For Random Forest: it will generate a binary result either yes or no.
  • For Logistic Regression: the result will generate a number from 0 to 1.

Data

        >train.csv: Dataset for training models
        >test.csv: Dataset for using models to make predictions and evaluate models' performance

You can download datasets from [https://www.kaggle.com/c/GiveMeSomeCredit/data?select=cs-training.csv]

Predicted system:

  • How it works:
        >The system will ask to provide 10 parameters for prediction.
        >With each prediction, the system will form a record by the person.
        >The system will offer two results each generated by one of the models we are using. 

Set ups

You will need the correct version of Java and sbt. The template requires:

  • Java Software Developer's Kit (SE) 1.8 or higher
  • sbt 1.3.4 or higher. To build and run the project: