The purpose of this project is to build a reliable model to predict volume responsiveness among Sepsis 3 patient group base on their waveform and EMR records. Lesion study is also included in order to check which category of waveform has most effect on model performance.
df5_wfflag.xlsx ICUSTAYS.csv
waveform_integration.py waveform_integration_deprec.py
integrated_mac.csv integrated_steph.csv integrated_windows.csv
waveform_eda.py
either imputation_merge_data.py or imputation_merge_data.ipynb This py file impute missing data from waveform and then merge it with EMR data using df5_wfflag.xlsx & ICUSTAYS.csv
merged_file_f.csv This csv file will be used for training and testing model
rf.ipynb or rf.py
xgboost_model.py
lassocv_util.py: contain load data function and lasso feature selection
learningcurve.py: learning curve function to be used for different models
lassocv_lc.py
Contains everything:
- Random forest(trainng model, random search to find hyperparameter, cross-validation, ROC curve, evaluation metrices, feature importance, learning curve)
- SVC(learning curve, hyperparameters are found from build_and_evaluate_model.py)
- Lasso logistic regression[(training model ,hyperparameter tuning, ROC cuve, cv, evaluation metrices)also included in build_and_evaluate_model.py, lesion studies,learning curve]
- xgBoost(training model, feature importance, ROC curve, evaluation metrices)
- ROC curves for all models in one plot