Blood Glucose Prediction - Research Platform
Closed this issue · 0 comments
miriamkw commented
Tasks remaining to finish the tool:
We will build heavily on the benchmarkpaper for this tool, to test it on real life scenarios.
Preparations
- Create a research branch, from which you will use as base branch for this work
- Remove the irrelevant code (models, metrics, documentation, example files)
Parsers
Summary: All the parsers should do the same: return a "raw dataframe" with all of the datatypes in the same time grid, but without any fuzz.
- Rewrite tidepool parser to do this
- Rewrite nightscout parser to do this
- Create an apple health parser since it has heartrate as input
- Create an Oura Ring parser?
- On call, save the dataframe in the data/raw repository (title metadata: start and end date, parser used, username?)
- Start making the CLI: Add an option to parse data
- Document: Nighscout cannot provide workouts, tidepool can, document which dataypes from apple parser
Preprocessors
- Define all prediction horizons --> add target columns
- Define how much history to use
- ALWAYS IN MG/DL!
- Save the df with a very descriptive name
- Test, train and val split
- Test: Does it work with both NS and tidepool?
- Update documentation (README, CLI comments)
- Add preprocessors for more (all?) models in benchmarking paper
- Add what-if events in the preprocessor(s)?
- Right now, preprocessors assums numerical features to be cgm, insulin and carbs. This should be dynamic from user input (like in model training).
Models
- Implement some sample prediction models. Base the implementation on the benchmark paper
- Add as input the numerical and the categorical features
- Make sure that all the examples are working
- Add CLI and have proper commenting
- Add models from benchmarking paper:
- scikit models
- ARX (LinearRegressor)
- Elastic net
- Huber
- Lasso
- Random forest
- Ridge
- SVR, linear kernel
- SVR, radial basis kernel
- XGBoost
- Gradient boosting trees
- Keras
- LSTM
- TCN
- scikit models
- Add documentation in readme
Evaluation
- Make sure they are all implemented, conventional ones.
- ESOD
- TG
- Handle comparisons and single ones. Store results in files!
- Plots: Handle comparisons
Real-time plots:
- Adding plots with solutions to predict one specific trajectory, either in real time or for a given date (with "true" measurements alongside).
- Adding interactive plot?
Settings
- Use global settings for unit conversion
- Implement all relevant files to have this as a boolean input. Fetching from config in the main script.
Command line interface
- Main menu: Settings or model
- Choose either preprocessing or existing dataset
- Choose either model training or pretrained model
- Choose evaluation metrics
Documentation
- Make it really clear in the beginning what this repository is, and shortly explain how it works
- Make it clear how to start the program
- Make it clear how to get the raw data: either by having your own dataset on a specific format, or
- Document datatypes
- Document the flow of the predictions
- Document the CLI
- Document file structure and base models. Make clear how people can add their:
- models
- evaluation metrics
- input values without using parsers
- pretrained models
- Document evaluation metrics
- Document plots
- Document assumptions:
- 5-minute intervals of CGM inputs
Code improvements (feedback from chatgpt):
- Add default values to config-file
- Dynamic choices for click-options (listing alternatives from modules)
- Check for potential exceptions (input verification), and write clear error messages
- Output verbose information: adding --verbose flag to describe cli commands
- add helper methods in a separate file to the CLI to keep the CLI file clean
- Add unit testing
- Consistency: When using models/metrics etc, make sure you use the same type of CLI-argument
- Duplicate code: Find a way to fix that. It is in processors, model training ...