forecast about autoregression
reoml opened this issue · 5 comments
What I do: I have trained a data file and get a timer model for MS task. I wanted to use another file to test the result by this model so I change the dataset to read two file and change the index for test file,then set the is_finetuning to zeros.
What I think: However, I set all target to zeros.(I think autoregression by mask will be used in the test prediction). I found that the prediction still depending on labels. Is there a mistake in my understanding of the decoding of transformer? How can I make auto regression in Timer without using label?
Hello, in the inference phase of Timer's autoregression, only the lookback window part is needed. During inference, Timer will provide the scrolling result of each segment in the lookback window in parallel, and we only need the result of the last segment as the final result. This process may differ from the autoregressive rolling in language models, but it is consistent in overall logic.
Hello, in the inference phase of Timer's autoregression, only the lookback window part is needed. During inference, Timer will provide the scrolling result of each segment in the lookback window in parallel, and we only need the result of the last segment as the final result. This process may differ from the autoregressive rolling in language models, but it is consistent in overall logic.
My understanding of autoregression is as follows: I have initial data features and sequence length labels. I use this initial data to predict the next data label, then shift the time step forward, using the new predicted label, new features, and the previous labels and features (minus one because we predicted a new label) to predict the next label. This operates like a sliding window, as you mentioned. However, after reviewing the source code, I noticed the following: the batch size is set to 1, and the code loops over batches, updating the predicted labels within each batch and then embedding them into X. This means that the predicted results from the current batch cannot be used for the next batch. Additionally, each batch inputs the labels from X. If masking is applied within the model, the model might not see the labels, which might not be a big issue. However, when I set the label data to all zeros, the predictions turned into a straight line. This indicates that the model is using label data during the test inference phase. I am not sure where my understanding is wrong, and I don't quite understand why the inference process code is written this way, which is why I am asking this question.
Sorry, I'm a bit confused. Why do autoregressive predictions need to use data from other batches in the mentioned scenario?
I contend that the source of confusion lies in the fact that pred_len and patch_len essentially serve as redundant parameters. In the model's implementation, each forward pass produces predictions for the subsequent patch_len timestamps. Conversely, within the dataloader, batch_x and batch_y are separated by a span of pred_len timestamps. A discrepancy in the sizes of pred_len and patch_len introduces logical inconsistencies. It should be noted that in scripts related to forecasting, both parameters are set to the same value.