General questions.
mathematicsofpaul opened this issue · 5 comments
I was wondering if this uses teacher forcing during training? And what terms did you use as the SOS
and EOS
tokens? :)
I have been trying to the get the transformer to work on time series for over a month now, and it seems almost nearly impossible using the nn.Transformer model provided by Pytorch. Did you by any chance get the decoder in the original transformer to work aswell?
I don't know if you can call this teacher forcing since its not a recurrent model - but I noticed that the results got better when I started to calculate the MSE Loss over the whole predicted sequence, instead of the just forecasted value.
Since all sequences are equal length, I did not use any SOS or EOS tokens.
One crucial step seems to be the positional encodings, without them the model is not able to order the items and does not produce any meaningful output. I never tried to use the decoders.
@oliverguhr super interesting, did you have to scale up your positional encodings for them to have a significant effect?
No, I used the default implementaion (as described in the paper) and it worked fine for me.
I close this issue for now. Feel free to reopen it if you have any questions. And also, feel free to share your project.
I was reading and debugging the multi-step implementation to understand it better. I've come across an interesting thing, seams like the features and labels in the training and evaluation are the same. This behavior is correct ? I thought that in a multi step prediction problem the input features is delayed in relation to the wanted labels, this way we have a window of past behavior of the data and we are aiming to predict the future behavior of the data.