Blood glucose Forecasting Approaches

This repository contains simple approaches to train blood glucose forecasting models given the patient's past. Models are trained and validated on the OhioT1DM dataset (2018 and 2020). All approaches are visible in various Jupyter notebooks.

Requirements

The "Ohio Data/" folder must be in the repository root directory with the following structure:

project/
|
|...
|
|--Ohio Data/
   |
   |--Ohio2018/
   |  |
   |  |--test/
   |  |  |
   |  |  |{patient_id}-ws-testing_processed.csv
   |  |
   |  |--train/
   |     |
   |     |{patient_id}-ws-training_processed.csv
   |
   |--Ohio2018/
   |  |
   |  |--test/
   |  |  |
   |  |  |{patient_id}-ws-testing_processed.csv
   |  |
   |  |--train/
   |     |
   |     |{patient_id}-ws-training_processed.csv    

# Python 3.8 or higher

pip3 install -r requirements.txt

jupyter-lab

The following sections describe my approach.

Research

Prior Knowledge

Most of my experience is in the field of computer vision. When it comes to tasks related to time series, I only have experience with anomaly detection. I have rarely used sequential neural networks, such as RNNs or LSTMs. In contrast, I was able to gain a lot of experience with CNNs.

Pytorch Forecasting

I found this framework just by coincidence. It seems to be the equivalent of what fastAI is for Pytorch (or even Keras is for TensorFlow). Besides forecasting models, the framework implements various features for data preparation.

For example:

  • Temporal time encoding
  • Sampling of missing values
  • Dataloader generator

(Blood glucose) Forecasting Papers

Unfortunately, as I am not a student anymore, I am unable to read some papers without being charged. Therefore I am limited to free paper I can find on the internet. Nevertheless, here is a list of papers I read or at least took a look at for preparation:

  1. Temporal Fusion Transformer
    • A derivation of the classical transformer model, specifically designed for time series forecasting
    • Distinguishes between categorical and continuous data
    • Can also take future known values as input
    • Achieves state-of-the-art results in forecasting tasks
  2. N-Hits
    • An enhanced version of N-BEATS
    • Instead of N-BEATS, N-Hits predicts interpolation coefficients to interpolate values across a time series
    • Also utilizes average pooling layers per block for
  3. Using N-BEAT to forecast blood glucose values
    • Uses a customized N-BEATS model to predict blood glucose values
    • The major difference is to include an LSTM inside the blocks
    • Also uses a customized loss function
  4. Using GANS
    • The generator generates the future blood glucose values up to a defined prediction horizon
    • The discriminator discriminates between ground truth and generated blood glucose values
    • My opinion:
      • Even though the results from the authors look promising, I can not imagine that this approach can beat other models (LSTM, Transformer, ...)
      • In the past, I experienced how hard it can be to train GANs
      • Furthermore, they require a lot of computational resources
  5. Comparison of different methods for blood glucose prediction
    • A comparison between many approaches for blood glucose forecasting
    • An LSTM Ensemble model achieved the best results

Reinforcement Learning

As the task states: deep reinforcement learning (DRL) can be used to solve this task. Intuitively this does not make much sense, as DRL is usually used to maximize a future outcome instead of just predicting future values. Many papers regarding stock trading describe their approach of using DLR techniques by taking past and current time series of stock prices. The major difference between trading strategies and general time series forecasting is that trading strategies aim to maximize their future portfolio value instead of just predicting the stock prices. Therefore it makes sense to use DRL for this task.

In this task, on the other hand, it would make sense if you would, e.g. measure what happens if you take treatment.

Furthermore, I could not find a single paper regarding blood glucose forecasting using DRL methods. I rarely find any prior time series forecasting work using reinforcement learning in general. Therefore I will only use "traditional" methods to solve this task.

Prior Knowledge

  • DRL:
    • Unfortunately I have close to zero prior knowledge about DRL
    • I know some basic terms (MDP, reward, Bellman Equation, return, ...)
    • I tried out Deep Q Learning for simple tasks
    • I read the paper of Alpha Zero out of curiosity. I understood the basic ideas but never reimplemented it on my own
    • I also read the paper of ReBel in the past

Approach

Data preparation

First of all, I took a deeper look into the data set. My data analysis is viewable in the "data.ipynb" notebook.

Summary

  • The data only contains continuous data, 5-minute timestamps, and no(!) categorical data
  • Many values (from target and non-target columns) are missing
    • There are extremely sparse columns like "carbInput" (98% missing)
    • But I believe that they can still contribute to the forecasting (e.g. after carbInput -> blood glucose should increase)
    • Except for the target column (cbg), missing values are interpolated by cubic splines
      • (One can argue that it doesn't make sense to interpolate, e.g. carbInput, because how would you interpolate if someone just ate?)
    • Because models work better with values between 0 and 1, all values are scaled accordingly (divided by max values of train set)
  • Correlations:
    • I scatter plotted the relations between every column and the respective cbg values
    • I noticed that the finger value is not as accurate as I expected (I expected an almost straight line)
    • Outliers:
      • There are only a few clear outliers (one carbInput, some hr)
      • Unfortunately, after looking at the points in time where they appear, I could not determine why this is the case
      • One can argue that removing them would be reasonable, but I decided to include them
  • Added data:
    • Some models perform better when taking temporal information as input
    • Therefore I added positional embedded information, using sin/cos embedding
  • Removed data:
    • If a data pair (input, label) contains at least one point with at least one missing cbg value, it is removed from the dataset

Metrics

I focussed on metrics other researchers used to evaluate their models (rMSE and MAE). The models I used took as input 24 past time steps (12hours) and a prediction horizon of 6/12 time steps (30min/60min).

Models

For each model, there is a separate notebook where I explain my approaches ({model_name}_approach.ipynb).

Models:

  1. N-BEATS (plain)
    • 12 blocks
    • loss function as described in this paper
  2. N-BEATS (paper) (from paper)
    • 12 blocks
    • loss function as described in this paper
  3. LSTM
    • teacher enforecement is enabled for training
    • non bidirectional
    • single layer
  4. LSTM
    • teacher enforecement is enabled for training
    • bidirectional
    • two layers
  5. Ensemble (LSTM, N-BEATS)

Results

N-Beats (plain)

Prediction Horizon
30 Minutes 60 Minutes
Participant ID rMSE MAE rMSE MAE
559 28.83 20.84 39. 29.27
563 24.96 18.64 33.40 25.22
570 24.20 18.52 34.15 26.53
575 26.90 19.57 35.62 26.76
588 26.03 19.03 34.52 25.39
591 25.76 19.76 34.26 26.80
540 34.74 26.04 43.78 32.84
544 24.36 18.58 34.62 27.24
552 25.24 18.59 32.35 24.79
567 31.78 23.35 41.11 31.40
584 30.27 22.77 39.93 30.68
596 24.59 18.34 34.46 26.16
mean 20.34 27.50 36.67 27.76

N-Beats advanced

Prediction Horizon
30 Minutes 60 Minutes
Participant ID rMSE MAE rMSE MAE
559 43.75 32.76 54.67 41.03
563 32.04 24.45 38.03 29.83
570 38.59 31.39 54.13 45.11
575 37.1 29.01 44.85 36.32
588 34.23 25.48 40.60 30.91
591 34.44 27.93 40.16 32.94
540 45.19 34.26 51.76 39.70
544 37.29 31.17 43.83 36.83
552 34.60 28.25 40.61 33.35
567 41.34 32.99 46.82 38.01
584 40.29 31.77 48.77 39.64
596 35.45 27.35 40.97 32.42
mean 31.05 39.11 45.77 36.34

Plain LSTM

Prediction Horizon
30 Minutes 60 Minutes
Participant ID rMSE MAE rMSE MAE
559 14.88 9.81 35.17 26.02
563 14.71 9.96 29.45 22.01
570 12.18 8.31 28.45 21.73
575 16.94 10.51 31.57 23.57
588 14.60 9.92 30.13 22.36
591 16.37 11.10 31.08 23.72
540 18.36 12.42 38.44 28.69
544 13.97 9.60 31.11 24.78
552 13.62 9.11 28.45 21.85
567 17.86 11.65 36.71 27.37
584 16.62 11.31 32.98 25.02
596 13.98 9.33 29.42 21.95
mean 15.44 10.25 32.07 24.01

Multistacked Bidirectional LSTM

Prediction Horizon
30 Minutes 60 Minutes
Participant ID rMSE MAE rMSE MAE
559 25.981 19.40 46.13 34.09
563 23.52 18.60 35.48 27.68
570 24.72 20.73 49.01 39.66
575 24.89 18.60 37.92 29.56
588 23.35 17.93 36.92 29.35
591 22.99 17.77 45.64 34.30
540 26.27 19.62 36.28 28.08
544 21.29 15.45 35.97 28.31
552 20.17 14.93 40.57 31.21
567 25.13 18.13 40.57 31.21
584 25.31 18.75 42.21 31.90
596 19.98 14.92 35.98 27.42
mean 23.72 17.90 40.40 31.00

Ensemble Model (Plain LSTM and Plain N-BEATS)

Prediction Horizon
30 Minutes 60 Minutes
Participant ID rMSE MAE rMSE MAE
559 14.23 9.22 26.52 17.27
563 13.91 9.00 20.64 13.92
570 12.27 8.27 21.87 15.29
575 16.19 9.88 25.53 17.53
588 13.79 9.15 21.08 14.45
591 15.49 10.22 23.98 16.81
540 16.13 10.83 28.51 19.34
544 13.16 8.90 22.89 16.43
552 12.16 8.40 21.09 14.50
567 15.66 10.30 27.21 18.05
584 15.52 10.33 24.83 16.71
596 12.84 8.45 20.46 13.85
mean 14.35 9.41 23.87 16.18

Discussion

I don't believe the results are accurate. I could not find a single paper with better results than my described approach, but because I only spent a relatively short amount of time on this task, it is hard to believe to achieve the best results with simple models. I believe this is because I removed all data where missing cbg values are contained. I checked out the repository to compare where my mistakes are but couldn't find any.

Pytorch Forecasting

I also created a notebook, using the library to create a temporal fusion transformer. I did not focus on the results of this approach because I just adapted the tutorial on the docs, which would not have shown my skills in Deep Learning. The resulted model even outperformed all previously mentioned models. If I would have had more time to solve this task, I would have studied the methods of this library further to improve my results.

Future Work

Due to the limited time I had to solve this task, there is much more to try out. I only implemented simple methods and ideas. Here is a list of things I would do to improve the given results.

  1. Data Processing:
    • Using the ARIMA model to interpolate the missing feature values instead of using splines
    • Since ARIMA is pretty powerful, I believe that it can supplement more accurate values
    • Data Augmentation: I don't know any method how it is possible, but I could spend some research
  2. Hyperparameter Tuning:
    • I only "guessed" good parameters instead of using well-known techniques like grid search or bayesian search
    • There is the library optuna which offers such functionality
    • Alternatively, I could just use sklearn
  3. Regularization:
    • As I mentioned in the Data preparation section, there exist a few outliers
    • To not overfit them, one can use regularization methods (dropout, batch norm, layer norm, etc.)
  4. Temporal Fusion Transformer
    • As mentioned, the transformer can outperform other models
  5. N-fold cross validation:
    • I did not include this, because I had limited computational resources