Welcome to the Scorecard Boosting repository! 🎉
Scorecard boosting is an innovative methodology for constructing credit scorecards by leveraging advanced machine learning (ML) techniques, specifically gradient boosting, that emerged in the domain of Credit Risk.
🛠️ This work draws upon and extends the code from the presentation "Machine Learning in Retail Credit Risk: Algorithms, Infrastructure, and Alternative Data — Past, Present, and Future [S31327]" by Paul Edwards, Director, Data Science and Model Innovation at Scotiabank and Weights & Biases' notebooks Interpretable Credit Scorecards with XGBoost.
Gradient boosting, which lies at the 💚 of scorecard boosting, is an ML technique that builds a predictive model by combining the outputs of multiple "weak" models, typically decision trees, to create a strong predictive model.
The algorithm works sequentially, with each new model focusing on correcting errors made by the previous ones. It minimizes a loss function by adding new models to the ensemble, and the final prediction is the sum of the predictions from all models.
One of the most known frameworks for gradient boosting with decision trees is XGBoost. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable.
For binary classification tasks like credit scoring, XGBoost performs a form of Logistic Regression. The algorithm is trained to minimize the log loss function, which is the negative log-likelihood of the true labels given a probabilistic model.
The algorithm used in XGBoost Logistic Regression follows the Newton-Raphson update method, which was initially described by J. Friedman (2001). XGBoost Logistic Regression also has ties to LogitBoost, which was described by J. Friedman et al. (2000).
To familiarize yourself further with gradient boosting and XGBoost follow the links below:
- How to explain gradient boosting
- Understanding Gradient Boosting as a gradient descent
- Around Gradient Boosting: Classification, Missing Values, Second Order Derivatives, and Line Search
- How Does Extreme Gradient Boosting (XGBoost) Work?
Boosted scorecards built on top of gradient-boosted trees allow to improve performance metrics like Gini score and Kolmogorov-Smirnov (KS) statistic compared to standard tools, while maintaining the interpretability of traditional scorecards. 📊 This is achieved by combining the best of both worlds: the interpretability of scorecards and the predictive power of gradient boosting. 🌐
A boosted scorecard can be seen as a collection of sequential decision trees transformed into a traditional scorecard format. 🌲 This scorecard comprises rules essential for computing a credit score, an evaluative measure of creditworthiness of new or existing customers. Typically ranging from 300 to 850, this score can be further customized using the Points to Double the Odds (PDO) technique, a concept extendable to gradient boosted decision trees.
Below we can see how the number of boosting iterations affects the distribution of boosted credit scores among good and bad customers:
Additionally, we can see how the depth of individual tree estimators in the gradient boosting ensemble affects the distribution of boosted credit scores among good and bad customers:
This repository contains a collection of notebooks and scripts that demonstrate how to build boosted scorecards.
scorecard-boosting-demo
: example of a boosted scorecard with XGBoostxgb_scorecard_constructor
: example of a boosted scorecard summary with XGBoost and thexgb_scorecard_constructor
package (WIP)other_notebooks
: other notebooks that demonstrate how to build scorecards with various ML techniques
- Boosting for Credit Scorecards and Similarity to WOE Logistic Regression
- Machine Learning in Retail Credit Risk: Algorithms, Infrastructure, and Alternative Data — Past, Present, and Future
- Building Credit Risk Scorecards with RAPIDS
- XGBoost for Interpretable Credit Models
credit_scorecard
- Projectvehicle_loan_defaults
- Artifacts 📊