oesophago-gastric-cancer-risk-prediction-ml

Oesophago-gastric cancer risk prediction on EHR data - comparison of a range of ML techniques

This repository provides the code used for the investigation of ML-based oesophago-gastric cancer risk prediction models using primary care EHR data in comparison with existing Cancer Risk Assessment Tools (Cancer RATs) in the UK (1, 2).

Models used

Logistic Regression
Support Vector Machine
Random Forest
Extreme Gradient Boosted Decision Trees
Bernoulli Naïve Bayes

Requirements

Libraries used and versions:

Python 3.7.10

pandas 1.2.4

scikit-learn 0.24.2

xgboost 1.5.0

numpy 1.20.2

matplotlib 3.3.4

Files

ml_models.py - development of ML classifiers and performance evaluation
RATs_tool.py - code used to implement the RATs model for oesophago-gastric cancer (1) and generate predictions
feature_importances.py - code used to generate feature importance graphs for models from ml_models.py
prediction_explanations.py - code used to generate explanations for individual predictions using models from ml_models.py

References

(1) Stapley S, Peters TJ, Neal RD, Rose PW, Walter FM, Hamilton W. The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case–control study using electronic records. Br J Cancer. 2013 Jan; 108(1): 25–31. https://doi.org/10.1038/bjc.2012.551

(2) Hamilton W. The CAPER studies: five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br J Cancer. 2009 Dec;101(S2):S80–6.

emmalucybriggs/oesophago-gastric-cancer-risk-prediction-ml

oesophago-gastric-cancer-risk-prediction-ml

Models used

Requirements

Files

References