Hello! As part of our Certified Summer Program, we were tasked with a group name called 'Andrew NG' in honor of the computer scientist, and technology entrepreneur. We were free to choose our datasets and do some analysis using Machine Learning algorithms.
- Annie Chandolu
- Anusree Kanadath
- Meenakshi Somisetty
- Ravalika Reddy Aduri
- Sushmitha Vulapalli
- Classifying Imbalanced Education Loan Dataset to give better prediction probability of sanctioning the loan when applying for H1-B1 Visa in the USA.
- ‘Accuracy' is not a good measure for evaluating the performance of the classifiers (Accuracy Paradox), and generalization being a big challenge for such a dataset other parameters have been considered to classify the data in a much better format.
We used few datasets from Kaggle, then preprocessed and split the data into training and testing sets:
We used Naïve Bayes Classifier and K-fold Cross Validation on the datasets which gave unsatisfactory results. Then we used XGB Algorithm which gave good results. We used four different parameters for better classification.
The table below is a comparison of all the AUCs calculated by changing the max_depth or the scale_pos_weight.
Sl.No. | scale_pos_weight | max_depth | AUC |
---|---|---|---|
1 | 1 | 7 | 0.602 |
2 | 1.18 x ratio | 7 | 0.754 |
3 | ratio | 7 | 0.754 |
4 | ratio | 5 | 0.752 |
5 | ratio | 20 | 0.758 |
ratio = (no. of inputs in class 0) / (no. of inputs in class 1)
Further analysis are recorded in our presentation.
- Our code can be found in this python notebook: Education Loan
- The presentation can be found here for detailed analysis results: Presentation
Datasets: