- EDA.ipynb - Jupyter notebook for EDA of the training data.
- Signature_FE_and_Model.ipynb - Jupyter notebook for loading the training data, training model pipeline and testing the F1 score over the test.
- Model analysis.ipynb - Jupyter notebook for testing different aspects and metrics over our models.
- model_hyperparameter_tuning.py - Script for performing grid search for a given model pipeline.
- tune_non_significant_cols.py - Script for finding the non statistically significant columns to be removed before training a model.
- tune_signature_model.py - Script for tuning signature's hyper parameters.
- utils/data_handler.py - Script containing several functions for data loading and processing, main function is: get_model_prepared_dataset, which loads dataset from a given folder.
- utils/feature_selection.py - Script containing functions for selecting features prior to model training and prediction.
- utils/signature.py - Script for creating signature per patient prior to model training and prediction.
- utils/statistics.py - Script for statistical tests and analysis to use in EDA notebook.
- predict.py - The requested file, writing the requested predictions csv.
Run the notebook Signature_FE_and_Model which will save ./model.pkl
. This will create a pickle file with the pipeline for XGBoost classifier.
For testing on different files, run the script predict.py
with argument directing to desired path (e.g. "/data/train/") and this will create prediction.csv
, without headers, where the first column is the patient's ID and the second is the prediction.