This work is part of the paper titled - "Performance Effectiveness of Vital Parameter Combinations for Early Warning of Sepsis - An Exhaustive Study Using Machine Learning" (under review)
A Sepsis Prediction Engine for hospitals that employs Gradient Boosted Decision Tree (XGBoost) on features extracted from vitals obtained from wearable sensors
- Python 3
- MatplotLib
- NumPy
- Pandas
Sequence of hourly measurements of the following vital signs:
- heart rate
- respiratory rate
- SpO2 (blood oxygen)
- temperature
These measurements obtained from patients of two different hospitals are contained in the following zip files. Each zip file when extracted generates the individual patient data files.
The raw files refer to Physionet CinC 2019 database, which are then preprocessed (as per inclusion exclusion criteria etc.) to generate the curated datasets used for this study.
The input should be formatted so that the measurements span a minimum of 3 hours and a maximum of 6 hours.
Input data files are zipped and can be accessed from the repository: Raw Dataset
- hospitalA(BIDMC)_sepsis_raw.zip
- hospitalA(BIDMC)_controls_raw.zip
- hospitalB(Emory)_sepsis_raw.zip
- hospital B_(Emory)_controls_raw.zip
Curated dataset for this study
- hospitalA(BIDMC)_sepsis_curated.zip
- hospitalA(BIDMC)_controls_curated.zip
- hospitalB(Emory)_sepsis_curated.zip
- hospital B_(Emory)_controls_curated.zip
The Algorithm is implemented as a set of following three python modules:
Module: Tele-SEP-train-model.py
Parameters: Each of the 15 sensor configurations (Si) Each of the 16 timing tuples (W,L)
Output: AUROC for each (Si,W,L)
For each sensor configuration the highest AUC yielding model is chosen to be validated in the next function
Module: Tele-SEP-ModelLoadRunOnly.py
Parameters: each of the 15 sensor configurations (Si) Best performing timing tuple (WAUC,LAUC) corresponding to Si.
import pickle
model_filename = 'trained-models/XGBoost/XGB-Model-PPG-RR-Temp-L6-M4-verified.sav'
# load the model from disk
loaded_model = pickle.load(open(model_filename, 'rb'))
# make predictions for test data
y_pred = loaded_model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
#confusion matrix
cnf_matrix = confusion_matrix(y_test, y_pred)
print(cnf_matrix)
Output: AUC and its difference from that obtained in function 1 (for each sensor configuration)
Module automatation being implemented
Parameters: AUROC threshold value AUCmin Lead time threshold value Lmin
Output: From the list of Sensor configurations arranged in ascending order based on number and complexity of vitals, choose the first configuration Smin for which AUROC obtained in module 1 and corresponding lead time are greater than or equal to their respective threshold values AUCmin and Lmin.
Modules 1,2 and 3 are run once at the setup time and a subset of the best performing pre-trained and validated models corresponding to various sensor configurations are also provided in the repository. During runtime, the following algorithm is used to predict sepsis for a new patient.
Parameters: Patient’s wearable sensor configuration Sp Patient_vitals = new patient data Lead time = 3,4,5,6 hours
Subroutines: Choose the Tele-SEP model that satisfies the patient’s wearable sensor configuration Sp. For the sensor configuration Sp, retrieve four sets of models Mp3, Mp4, Mp5, Mp6 corresponding to the four lead times. From each set choose the best performing model Mp3, Mp4, Mp5, Mp6. Run these on Patient_vitals to compute the sepsis probabilities.
Output: The maximum of the four sepsis probabilities and the corresponding lead time resulting from the above computation