SepsisLabelPredictor: A Jupyter Notebook repository from dvirla

Description of Files:

EDA.ipynb - Jupyter notebook for EDA of the training data.
Signature_FE_and_Model.ipynb - Jupyter notebook for loading the training data, training model pipeline and testing the F1 score over the test.
Model analysis.ipynb - Jupyter notebook for testing different aspects and metrics over our models.
model_hyperparameter_tuning.py - Script for performing grid search for a given model pipeline.
tune_non_significant_cols.py - Script for finding the non statistically significant columns to be removed before training a model.
tune_signature_model.py - Script for tuning signature's hyper parameters.
utils/data_handler.py - Script containing several functions for data loading and processing, main function is: get_model_prepared_dataset, which loads dataset from a given folder.
utils/feature_selection.py - Script containing functions for selecting features prior to model training and prediction.
utils/signature.py - Script for creating signature per patient prior to model training and prediction.
utils/statistics.py - Script for statistical tests and analysis to use in EDA notebook.
predict.py - The requested file, writing the requested predictions csv.

Reproduction Instruction

Run the notebook Signature_FE_and_Model which will save ./model.pkl. This will create a pickle file with the pipeline for XGBoost classifier. For testing on different files, run the script predict.py with argument directing to desired path (e.g. "/data/train/") and this will create prediction.csv, without headers, where the first column is the patient's ID and the second is the prediction.

dvirla/SepsisLabelPredictor

Description of Files:

Reproduction Instruction