Why make label length same as seq length?

Question

Why make label length same as seq length?

Xanyv opened this issue 4 years ago · 8 comments

Xanyv commented 4 years ago

The create_inout_sequences function return the label length same as seq length.
Like

in -> [0..99]

target -> [1..100]

But I saw many LSTM works make the data sequence like:

in -> [0..99]

target -> [100]

so I changed the create_inout_sequences function to return 100seq and 1 label for each sequence. It raise error

Traceback (most recent call last):
File "E:/Trans.py", line 254, in
train_data, val_data = get_data()
File "E:/Trans.py", line 120, in get_data
train_sequence = create_inout_sequences(train_data, input_window)
File "E:/Trans.py", line 98, in create_inout_sequences
return torch.FloatTensor(inout_seq)
ValueError: expected sequence of length 100 at dim 2 (got 1)

I don't know how to fix it. I wonder why make label length same as seq length? Is it possible to make the sequences comprised of 100seq and 1label?

Answer 1 · 2020-09-16T09:06:40.000Z

Well, it is possible to shape the data to 100 train and 1 label sample, but you have to change the code accordingly.

The reason why the train and test sequences are equal lengths is simply better performance :)
The transformer returns a value for each timestep in a sequence. During my tests, I found that the results got better when I calculated the loss over the whole sequence rather than just for the last timestep.

As I can remember the difference is not dramatic but noticeable. So it is possible to train the model with a single target value. I guess it also depends on the actual data, which method is better.

To fix this error you need to change the "create_inout_sequences" and the calculation of the loss - then I should work.

Answer 2 · 2020-09-23T06:57:34.000Z

Ohhh, Thanks for your reply and I'm sorry I didn't reply you in the first time. The "create_inout_sequences" has already been changed to 1 label and I think the the calculation of the loss is fine while the model output the sample length as input should be changed.
After done that, it still raise :
''
RuntimeError: The size of tensor a (2) must match the size of tensor b (256) at non-singleton dimension 2
''
I think it might cause something wrong in the embedding?
Besides , I ever tried simply change the "create_inout_sequences" to 1-trian 1-label(so the length are equal :D )on three different dataset, it turns out the loss is quiet similar, but the training speed become much faster. 👍

Answer 3 · 2020-09-24T05:41:41.000Z

I fixed the problem! It works fine with 1label. I tested 1label and 100label on 4 different dataset, their losses turns out similar while the validation plot with 100label looks better, which you are right! : D

If you don't mind I got a last question:
why the decoder layer is a linear, is it just for simplicity? Have you ever tried to change it to TransformerDecoder, like what the paper described. Would it have a better performance on time series?
Looking forward your reply : )

Answer 4 · 2020-09-24T06:45:41.000Z

Good work! I did not try to use the Transformer decoder, but that would be a super interesting project! I suspect that you would need to implement it the way it was described in the "Attention is all you need" paper to be really effective. Otherwise it could be more or less just an additional Transformer layer.

Because I had no time to implement the encoder/decider schema from the paper, I decided to use just the encoder. As it turned out the results got a bit when I added a simple FF layer as a last layer. That's also very common for Bert based models.

Answer 5 · 2020-09-24T10:38:20.000Z

Thanks for your patience! I typed the paper name "Attention is all you need" but it didn't appear in the comment XD
Besides, I compared this model with LSTM. The LSTM losses are lower in some datasets

Answer 6 · 2022-02-22T15:58:28.000Z

Hi, thanks for the code!

Is it possible also to have:
in -> [0..60] and target -> [61:100]

like a sequence to sequence model? Then the decoder should be adapted?

Answer 7 · 2022-02-22T20:44:29.000Z

@misterspook You can do that. This model maps one input to one output value - if you calculate the loss over the last 39 values, you force the model to predict these values. That's what the multi-step code does, you just need to adjust these to values (and train it on your data of course ):
https://github.com/oliverguhr/transformer-time-series-prediction/blob/master/transformer-multistep.py#L27

Answer 8 · 2022-02-23T13:38:40.000Z

Thanks! I have implemented it like that just with slicing the outputs and making it the same shape as the target variable.