eonu/sequentia

DeepGRU for multivariate time series input

Vaibhav-2022 opened this issue · 4 comments

Hello,
Thank you for sharing your work. Can you please tell me how I can use DeepGRU on a custom multivariate time series dataset? Do I need to write a custom Dataloader for it?.

eonu commented

Hi @Vaibhav-2022, glad to see a new user :)

You will first have to implement a torch.utils.data.Dataset class which specifies how to fetch a single time series and label from your data.

This has to implement a __len__ and a __getitem__ function. As an example, you can see a dataset class that I defined in another repository of mine here where the _getitem_ function takes an index, and simply retrieves the audio data from the file with that index, then optionally applies some transformations (e.g. transform the signal into some multivariate time series). It then returns that single transformed time series along with its label.

Once you've defined that, you need to create your dataset object(s) from that class, e.g.:

train_set = MyDataset(train_files)
test_set = MyDataset(test_files)

Then you can create a data loader using a predefined collate_fn that I made which takes a batch of the multivariate time series from the dataset objects, and correctly pads all of the time series in each batch, as expected by torch RNN layers.

import torch
from sequentia.classifiers.rnn import DeepGRU, collate_fn

train_gen = torch.utils.data.DataLoader(train_set, collate_fn=collate_fn, batch_size=64, shuffle=True, num_workers=0)
test_gen = torch.utils.data.DataLoader(test_set, collate_fn=collate_fn, batch_size=64, num_workers=0)

If you haven't already, please check out this notebook that shows an example of DeepGRU in usage.

I'm happy to give more specific assistance if you can give more details about the kind of data you are working with.

@eonu Thank you for your detailed explanation. I wrote the torch.utils.data.Dataset in a similar way as you have written here and then used Dataloader to pass the data as input to the DeepGRU for training but I got a ValueError: too many values to unpack (expected 3). Maybe I am missing something let me reiterate my code and get back to you.
Question: Do I need to create a class like this ?.

Details of my data: I have multiple CSV files in each CSV file there are 4 columns (1 for each feature) that are timestamp synced. My idea is to combine these CSV files (I have written code for it) and then pass them to the DeepGRU network for training with a label. I am not sure will the DeepGRU network accept such input (Since each column in the CSV file represents single time series and I have 4 columns in total so it forms multivariate time series ). Let me know if you need more explanation and feel free to correct if I am going wrong here.

eonu commented

So if I understand correctly:

  • each CSV file represents a single time series
  • the features of the time series represented by each CSV file are the 4 columns in the file

So the 4 columns in each file all have equal length, but the number of rows in each CSV file can differ, right?

Also no, you don't need to implement the generator that I created in that file, you only really need the Dataset, and some way of specifying which files belong to that dataset. You can see in the train_test_split function of the generator class, I am specifying which files belong to the training and test data, then creating the Dataset objects for those.

What is the piece of code you are running which causes that ValueError?

Your code should look something like this:

# Specify files to be used to create the dataset
train_files = ['1.csv', '2.csv', '3.csv']
train_set = MyDataset(train_files)

# Create a DataLoader from the dataset
train_gen = torch.utils.data.DataLoader(train_set, collate_fn=collate_fn, batch_size=64, shuffle=True, num_workers=0)

# Initialize the model
model = DeepGRU(n_features, n_classes)

# Toggle training mode
model.train()
    
# Training loop
for batch, lengths, labels in train_gen:
    # Use batches

You should make sure that your __getitem__ function in the dataset class returns a tuple of:

  • a single torch tensor (with shape n_features x n_time_steps)
  • a label for the time series
eonu commented

Closing as classifiers.rnn module was removed in #215.