What features are most important in determining whether a customer is "high risk" (credit defaulter)?
Maha Salman Cheema, Rachel Woodill, Zhao Wen
In this project, we aim to develop a machine learning model that can predict whether an individual is likely to be a credit card defaulter.
FlaskApp: FlaskApp > app.py
Raw Data: not uploaded due to file size, can be found at https://www.kaggle.com/datasets/mishra5001/credit-card
Data Cleaning Notebook: Documents > Data_cleanning_app_data_for_ROS.ipynb
Machine Learning Notebook: Documents > decision_tree_ROS_model.ipynb
Visualizations Notebook: Documents > visualizations_notebook.ipynb
Saved Figures (for visualization): Documents > figures > (all files)
Previous Attempts: Documents > Previous_Attempts > (all files)
Description of Features (for Dataset): Documents > Misc > columns_description.csv
Credit Card Defaulters (https://www.kaggle.com/datasets/mishra5001/credit-card)
Data Model Implementation (25 points)
A Python script initializes, trains, and evaluates a model (10 points)
The data is cleaned, normalized, and standardized prior to modeling (5 points)
The model utilizes data retrieved from SQL or Spark (5 points)
The model demonstrates meaningful predictive power at least 75% classification accuracy or 0.80 R-squared. (5 points)
Data Model Optimization (25 points)
The model optimization and evaluation process showing iterative changes made to the model and the resulting changes in model performance is documented in either a CSV/Excel table or in the Python script itself (15 points)
Overall model performance is printed or displayed at the end of the script (10 points) GitHub Documentation (25 points)
GitHub repository is free of unnecessary files and folders and has an appropriate .gitignore in use (10 points) The README is customized as a polished presentation of the content of the project (15 points) Group Presentation (25 points)
All group members speak during the presentation. (5 points)
Content, transitions, and conclusions flow smoothly within any time restrictions. (5 points)
The content is relevant to the project. (10 points)
The presentation maintains audience interest. (5 points)