miriamkw/GluPredKit

Refactoring

Closed this issue · 5 comments

Evaluator:

  • Create a base class with the mandatory functions and properties
  • Implement a class for each "penalty function" or evaluator
  • Be thoughtful about naming conventions
  • Go through the literature and look for different metrics
  • Document the different metrics with explanations of what they capture and references to literature if relevant. You could also mention some weaknesses to the specific metric.
  • Support that y_pred and y_target are matrixes with different prediction horizons, since some metrics are dependent on that.

Prediction models:

  • Create a base class with the mandatory functions and properties
  • Implement a class for each prediction approach.
    • Highest priority is the loop algorithm.
    • Linear regreesion
  • Be thoughtful about naming conventions
  • Should have a fit() method and a predict() method. Output should be defined somehow. Take inspiration from scikit learn.
  • Document the different examples and instructions to how people can implement their own algorithms. Write down some disclaimers about what is expected from the user to handle (like information leakage between train and test data).

tidepool_parser.py:

  • tidepool_parser.py and other code has no unused code, and that the code is written efficiently
  • Add into a folder "parsers" as we later might add a nightscout_parser or tandem_parser etc.
  • Any unused code that can be removed?
  • This file should not handle prediction using pyloopkit. This file should solemnly process tidepool data into the format that we feed the model base class with
  • Think thoroughly through how the data input format should be. Json? Dataframes? Which metadata will be included?
  • Run a profiler, improve efficiency of the file
  • Document the format of the output data from the tidepool_parser (or any other potential data source), which will be the input data to each model.

Example scripts:

  • Make example scripts work with the refactored code/create new examples

Tests:

  • Write tests for all metrics
  • Write tests for all prediction models
  • Add tests to the test_all.py for all implemented metrics and models
  • Write in documentation how to run the tests, and why it is a good way for people to check whether they implemented correctly

Cleanup:

  • Delete outdated files and folders

Plots (low priority):

  • Make a module for plotting results into for example SEG

Questions:

  • In the proposed architecture, this repo only handles calculating penalties given some pairs of measured and predicted values, and is completely agnostic to which prediction approach is used. Hence, does the tidepool_parser and scripts retreiving loop-forecasts with pyloopkit belong in this repository, or should we create an additional repository for these tasks? @PorkShoulderHolder

UML class diagram

TO DO: Add a sample of the input for calculating penalties (dataframe with measured and predicted values), and output

Notes meeting 19.04:

  • Inspiration from https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html
  • The penalty functions should expect two arrays: predicted and measured values
  • "Extra" parameters (like squared, not squared) can be specific to the error function
  • Place for "all" the common error metrics to live (but they should be well documented!). Implement them. Describe them in a table, and refer to literature if necessary.
  • Predictions: .fit(), .predict(). Users have a model and some data. Make it as easy as possible using that. Using scikit learn as a template
  • Interface for a model, and preprocessing etc is handle inside of the class. We are leaning on the user to not train on the test data (which is a downside). Returning ONLY prediction.
  • predictions need to have data inputs separate, and create its own dataframe or whatever
  • we need to specify clearly what are the formats of the insulin, glucose, carb data etc... documentation
  • Be specific about the data that is coming in. People use only what they need.
  • Model return: one value or a list of predictions. Should be specified in the model (input, list of offsets). But in general, we expect there to be a trajectory of predictions.
  • Model instance like in scikit: Instance. Train with some fed data (handled inside of the model). Fit method takes in some data. Predict method takes in some data. People must themselves handle data leakage. Model has a function: get_prediction_output_format but also set_prediction_output_format.
  • tidepool_parser: we could create a dataloader object
  • tidepool_parser: separate parse report, and run a predict

folder of models/evaluation: base classes, implement methods that all models/evaluation will have.
base classes. sub classes inherit from bases classes will return a "not implemented" error.

Questions for next meeting:

  • Having issues with imports and ModuleNotFoundError in tests folder
  • Assuming mg/dL as input in metrics.
  • model.predict() actually returns predictions AND true values
    • I would like to improve this output. It should include: Prediction date, reference prediction date and value.
  • I have included future inputs for carbs and insulin in the predictions. If not, we need to filter out future inputs for each prediction. Its still a good idea to filter out data 6hrs ahead, to avoid getting predictions "forever" into the future.
  • I cant get the retrospective correction numbers right - but I also dont know how it works... So it is hard to debug