/oesophago-gastric-cancer-risk-prediction-ml

Oesophago-gastric cancer risk prediction on EHR data - comparison of a range of ML techniques

Primary LanguagePython

oesophago-gastric-cancer-risk-prediction-ml

Oesophago-gastric cancer risk prediction on EHR data - comparison of a range of ML techniques

This repository provides the code used for the investigation of ML-based oesophago-gastric cancer risk prediction models using primary care EHR data in comparison with existing Cancer Risk Assessment Tools (Cancer RATs) in the UK (1, 2).

Models used

  • Logistic Regression
  • Support Vector Machine
  • Random Forest
  • Extreme Gradient Boosted Decision Trees
  • Bernoulli Naïve Bayes

Requirements

Libraries used and versions:

Python 3.7.10

pandas 1.2.4

scikit-learn 0.24.2

xgboost 1.5.0

numpy 1.20.2

matplotlib 3.3.4

Files

  • ml_models.py - development of ML classifiers and performance evaluation
  • RATs_tool.py - code used to implement the RATs model for oesophago-gastric cancer (1) and generate predictions
  • feature_importances.py - code used to generate feature importance graphs for models from ml_models.py
  • prediction_explanations.py - code used to generate explanations for individual predictions using models from ml_models.py

References

(1) Stapley S, Peters TJ, Neal RD, Rose PW, Walter FM, Hamilton W. The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case–control study using electronic records. Br J Cancer. 2013 Jan; 108(1): 25–31. https://doi.org/10.1038/bjc.2012.551

(2) Hamilton W. The CAPER studies: five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br J Cancer. 2009 Dec;101(S2):S80–6.