variable input lengths

Question

variable input lengths

theDweeb opened this issue 5 years ago · 2 comments

Hello, I am currently using creating custom datasets for https://github.com/locuslab/TCN which have input sequences of different timesteps. I have padded them with zeros so that they are all the same length (same as you) but some of them are more than three times larger than others (thats a lot of zeros) which is causing non ideal results. My question is if you had any other strategies to combat this? I am trying to outperform LSTM's, which handle this problem with ease, and the only ways I have come up with is 1) padding, and 2) upscaling/interpolating the smaller signals (but this could have negative effects as well) to fit the largest signal.

Thanks for any feedback

Answer 1 · 2020-03-12T13:08:49.000Z

@theDweeb yes it's a common problem when batching. You have three main options:

use batch size of 1. https://github.com/philipperemy/keras-tcn/blob/master/tasks/multi_length_sequences.py
bucket the sequences of same length and batch them together. For example one batch of length 10, one batch of length 12, one batch of length 100.
pad with 0 or a neutral number (mean of signal for example).

Another one -like you mentioned- is to alter your data to downsample it which I don't really recommend but I would have to see the data first.

Answer 2 · 2020-03-12T13:10:10.000Z

Feel free to re-open the issue if it's not clear.