variable input lengths
theDweeb opened this issue · 2 comments
Hello, I am currently using creating custom datasets for https://github.com/locuslab/TCN which have input sequences of different timesteps. I have padded them with zeros so that they are all the same length (same as you) but some of them are more than three times larger than others (thats a lot of zeros) which is causing non ideal results. My question is if you had any other strategies to combat this? I am trying to outperform LSTM's, which handle this problem with ease, and the only ways I have come up with is 1) padding, and 2) upscaling/interpolating the smaller signals (but this could have negative effects as well) to fit the largest signal.
Thanks for any feedback
@theDweeb yes it's a common problem when batching. You have three main options:
- use batch size of 1. https://github.com/philipperemy/keras-tcn/blob/master/tasks/multi_length_sequences.py
- bucket the sequences of same length and batch them together. For example one batch of length 10, one batch of length 12, one batch of length 100.
- pad with 0 or a neutral number (mean of signal for example).
Another one -like you mentioned- is to alter your data to downsample it which I don't really recommend but I would have to see the data first.
Feel free to re-open the issue if it's not clear.