Oesophago-gastric cancer risk prediction on EHR data - comparison of a range of ML techniques
This repository provides the code used for the investigation of ML-based oesophago-gastric cancer risk prediction models using primary care EHR data in comparison with existing Cancer Risk Assessment Tools (Cancer RATs) in the UK (1, 2).
- Logistic Regression
- Support Vector Machine
- Random Forest
- Extreme Gradient Boosted Decision Trees
- Bernoulli Naïve Bayes
Libraries used and versions:
Python 3.7.10
pandas 1.2.4
scikit-learn 0.24.2
xgboost 1.5.0
numpy 1.20.2
matplotlib 3.3.4
- ml_models.py - development of ML classifiers and performance evaluation
- RATs_tool.py - code used to implement the RATs model for oesophago-gastric cancer (1) and generate predictions
- feature_importances.py - code used to generate feature importance graphs for models from ml_models.py
- prediction_explanations.py - code used to generate explanations for individual predictions using models from ml_models.py
(1) Stapley S, Peters TJ, Neal RD, Rose PW, Walter FM, Hamilton W. The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case–control study using electronic records. Br J Cancer. 2013 Jan; 108(1): 25–31. https://doi.org/10.1038/bjc.2012.551
(2) Hamilton W. The CAPER studies: five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br J Cancer. 2009 Dec;101(S2):S80–6.