miriamkw/GluPredKit

Blood Glucose Prediction - Research Platform

Closed this issue · 0 comments

Tasks remaining to finish the tool:

We will build heavily on the benchmarkpaper for this tool, to test it on real life scenarios.

Preparations

  • Create a research branch, from which you will use as base branch for this work
  • Remove the irrelevant code (models, metrics, documentation, example files)

Parsers
Summary: All the parsers should do the same: return a "raw dataframe" with all of the datatypes in the same time grid, but without any fuzz.

  • Rewrite tidepool parser to do this
  • Rewrite nightscout parser to do this
  • Create an apple health parser since it has heartrate as input
  • Create an Oura Ring parser?
  • On call, save the dataframe in the data/raw repository (title metadata: start and end date, parser used, username?)
  • Start making the CLI: Add an option to parse data
  • Document: Nighscout cannot provide workouts, tidepool can, document which dataypes from apple parser

Preprocessors

  • Define all prediction horizons --> add target columns
  • Define how much history to use
  • ALWAYS IN MG/DL!
  • Save the df with a very descriptive name
  • Test, train and val split
  • Test: Does it work with both NS and tidepool?
  • Update documentation (README, CLI comments)
  • Add preprocessors for more (all?) models in benchmarking paper
  • Add what-if events in the preprocessor(s)?
  • Right now, preprocessors assums numerical features to be cgm, insulin and carbs. This should be dynamic from user input (like in model training).

Models

  • Implement some sample prediction models. Base the implementation on the benchmark paper
  • Add as input the numerical and the categorical features
  • Make sure that all the examples are working
  • Add CLI and have proper commenting
  • Add models from benchmarking paper:
    • scikit models
      • ARX (LinearRegressor)
      • Elastic net
      • Huber
      • Lasso
      • Random forest
      • Ridge
      • SVR, linear kernel
      • SVR, radial basis kernel
    • XGBoost
      • Gradient boosting trees
    • Keras
      • LSTM
      • TCN
  • Add documentation in readme

Evaluation

  • Make sure they are all implemented, conventional ones.
    • ESOD
    • TG
  • Handle comparisons and single ones. Store results in files!
  • Plots: Handle comparisons

Real-time plots:

  • Adding plots with solutions to predict one specific trajectory, either in real time or for a given date (with "true" measurements alongside).
  • Adding interactive plot?

Settings

  • Use global settings for unit conversion
  • Implement all relevant files to have this as a boolean input. Fetching from config in the main script.

Command line interface

  • Main menu: Settings or model
  • Choose either preprocessing or existing dataset
  • Choose either model training or pretrained model
  • Choose evaluation metrics

Documentation

  • Make it really clear in the beginning what this repository is, and shortly explain how it works
  • Make it clear how to start the program
  • Make it clear how to get the raw data: either by having your own dataset on a specific format, or
  • Document datatypes
  • Document the flow of the predictions
  • Document the CLI
  • Document file structure and base models. Make clear how people can add their:
    • models
    • evaluation metrics
    • input values without using parsers
    • pretrained models
  • Document evaluation metrics
  • Document plots
  • Document assumptions:
    • 5-minute intervals of CGM inputs

Code improvements (feedback from chatgpt):

  • Add default values to config-file
  • Dynamic choices for click-options (listing alternatives from modules)
  • Check for potential exceptions (input verification), and write clear error messages
  • Output verbose information: adding --verbose flag to describe cli commands
  • add helper methods in a separate file to the CLI to keep the CLI file clean
  • Add unit testing
  • Consistency: When using models/metrics etc, make sure you use the same type of CLI-argument
  • Duplicate code: Find a way to fix that. It is in processors, model training ...