/boosting-scorecards

Boosting scorecards

Primary LanguageJupyter Notebook

Scorecard Boosting 🚀

Welcome to the Scorecard Boosting repository! 🎉

Scorecard boosting is an innovative methodology for constructing credit scorecards by leveraging advanced machine learning (ML) techniques, specifically gradient boosting, that emerged in the domain of Credit Risk.

🛠️ This work draws upon and extends the code from the presentation "Machine Learning in Retail Credit Risk: Algorithms, Infrastructure, and Alternative Data — Past, Present, and Future [S31327]" by Paul Edwards, Director, Data Science and Model Innovation at Scotiabank and Weights & Biases' notebooks Interpretable Credit Scorecards with XGBoost.

Gradient Boosting 📈

Image

Gradient boosting, which lies at the 💚 of scorecard boosting, is an ML technique that builds a predictive model by combining the outputs of multiple "weak" models, typically decision trees, to create a strong predictive model.

The algorithm works sequentially, with each new model focusing on correcting errors made by the previous ones. It minimizes a loss function by adding new models to the ensemble, and the final prediction is the sum of the predictions from all models.

One of the most known frameworks for gradient boosting with decision trees is XGBoost. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable.

For binary classification tasks like credit scoring, XGBoost performs a form of Logistic Regression. The algorithm is trained to minimize the log loss function, which is the negative log-likelihood of the true labels given a probabilistic model.

The algorithm used in XGBoost Logistic Regression follows the Newton-Raphson update method, which was initially described by J. Friedman (2001). XGBoost Logistic Regression also has ties to LogitBoost, which was described by J. Friedman et al. (2000).

To familiarize yourself further with gradient boosting and XGBoost follow the links below:

Bosted scorecards 📈

Boosted scorecards built on top of gradient-boosted trees allow to improve performance metrics like Gini score and Kolmogorov-Smirnov (KS) statistic compared to standard tools, while maintaining the interpretability of traditional scorecards. 📊 This is achieved by combining the best of both worlds: the interpretability of scorecards and the predictive power of gradient boosting. 🌐

A boosted scorecard can be seen as a collection of sequential decision trees transformed into a traditional scorecard format. 🌲 This scorecard comprises rules essential for computing a credit score, an evaluative measure of creditworthiness of new or existing customers. Typically ranging from 300 to 850, this score can be further customized using the Points to Double the Odds (PDO) technique, a concept extendable to gradient boosted decision trees.

Below we can see how the number of boosting iterations affects the distribution of boosted credit scores among good and bad customers:

Image

Additionally, we can see how the depth of individual tree estimators in the gradient boosting ensemble affects the distribution of boosted credit scores among good and bad customers:

Image

Repository Contents 📚

This repository contains a collection of notebooks and scripts that demonstrate how to build boosted scorecards.

  • scorecard-boosting-demo: example of a boosted scorecard with XGBoost
  • xgb_scorecard_constructor: example of a boosted scorecard summary with XGBoost and the xgb_scorecard_constructor package (WIP)
  • other_notebooks: other notebooks that demonstrate how to build scorecards with various ML techniques

Useful resources 📖