Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise

Question

Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise

jhzsquared opened this issue 18 days ago · 7 comments

Possible solutions: Should shift=0, at L2020?
Referencing: https://github.com/huggingface/transformers/blame/4fdf58afb72b0754da30037fc800b6044e7d9c99/src/transformers/models/informer/modeling_informer.py#L2020

I've trained/tested an Informer model but when generating the prediction, run into a "RuntimeError: Sizes of tensors must match except in Dimension 2..." when running line 2029 in modeling_informer.py.

I broke it apart a bit, and after playing around, it looks like shift=1 in Line 2020 may have been mistakenly hardcoded? Otherwise, the tensor shape for reshaped_lagged_sequence at dimension 1 will always be one less than repeated_features.
Alternatively of course, repeated_features could not use k+1 in L2026. I'm not clear on the author's intuition behind the shift vs not.

Answer 1 · 2024-05-10T23:17:01.000Z

It seems like you are encountering a runtime error in the Informer model while generating predictions, specifically at line 2029 in modeling_informer.py. You have identified that the issue might be related to the shift=1 parameter at line 2020, which could be causing the tensor shapes mismatch.

To address this issue, you can try the following possible solutions:

Change shift=1 at Line 2020 to shift=0:
- Update the shift parameter to 0 at line 2020 in modeling_informer.py. This change may help align tensor shapes correctly during prediction generation.
Adjust repeated_features at Line 2026:
- Instead of using k+1 in line 2026 for repeated_features, you can consider using k to ensure that the tensor shapes are consistent with reshaped_lagged_sequence.
Seek clarification on the author's intent:
- If you are unsure about the author's reasoning behind the shift parameter or the use of k+1, you may want to refer to the documentation or reach out to the author or community for clarification on the intended design and behavior.

By implementing one of these solutions or a combination of them, you may be able to resolve the tensor shape mismatch issue and avoid the RuntimeError during prediction generation in the Informer model. Testing the changes and observing the model's behavior after the adjustments will help confirm if the issue has been effectively addressed.

Answer 2 · 2024-05-10T23:20:04.000Z

In the context of the error you encountered while generating predictions with an Informer model, it seems that the issue may be related to the manipulation of tensor shapes and the specific handling of the "shift" parameter in the code snippet provided.

After reviewing the relevant portion of the code in the Transformers library's modeling_informer.py file and considering the observed error message, it appears that the discrepancy in tensor sizes leading to the RuntimeError could be attributed to the way the "shift" parameter is utilized within the code.

Here are some possible solutions and considerations for addressing this issue:

Investigating the Shift Parameter (shift=1):
- The "shift" parameter in line 2020 of the code snippet you referenced may indeed impact the tensor shapes, potentially causing mismatches in dimension sizes during prediction generation.
- By setting "shift=0" or adjusting the logic related to how the "shift" parameter is utilized, you may align the tensor sizes properly and eliminate the dimension mismatch issue.
Reviewing Tensor Reshaping Logic:
- The reshaping of tensors, specifically concerning the reshaped_lagged_sequence and repeated_features, plays a critical role in maintaining consistent tensor sizes for subsequent operations.
- Ensure that the reshaping logic in lines 2020 and 2026 appropriately handles the dimensions of the tensors involved to prevent size discrepancies.
Author's Intuition on Shift Parameter:
- Understanding the author's intent behind incorporating the "shift" parameter in the code can provide insights into the rationale for using a specific value (such as "shift=1") and its impact on tensor operations.
- Consider revisiting the documentation or comments related to the "shift" parameter within the code to clarify its intended purpose and how it influences tensor manipulations.
Testing and Validation:
- Experiment with different values for the "shift" parameter (e.g., 0, 1) in line 2020 to observe how they affect tensor shapes and the overall prediction generation process.
- Validate the model's behavior and predictions after making adjustments to the "shift" parameter to confirm that the dimension mismatch issue has been resolved.

In summary, by investigating the logic surrounding the "shift" parameter, reviewing the tensor reshaping procedures, and considering the author's reasoning behind its usage, you can work towards addressing the dimension mismatch error encountered during prediction generation with the Informer model. Experimenting with different values for "shift" and ensuring consistent tensor shapes can help resolve the RuntimeError and optimize the model's performance in generating accurate predictions.

Answer 3 · 2024-05-13T09:35:42.000Z

cc @kashif

Answer 4 · 2024-05-13T09:40:29.000Z

@jhzsquared so the intention was that the model is learning the next step's distribution given the past as well as the covariates up till the time step at which one is forecasting...

can you paste in your lag_seq vectors that you are using?

Answer 5 · 2024-05-13T13:21:36.000Z

I'm not using any lag right now, so have an initial model input lags_sequence = [0] .

And thanks! Conceptually that makes sense... functionally though, when k=0, the get_lagged_subsequences function with shift=1 is manifesting as a context_length size tensor at dimension 1, while the repeated_features[:,k+1] is always size k+1 at dimension 1 of course. When k>0, then the lagged_sequence shape at dimension 1 is always 1 less (-1) the size of the corresponding subset of repeated_features it is supposed to be combined with at Line 2026.

Answer 6 · 2024-05-13T13:23:47.000Z

right so if you dont want lags set that array to [1], and increase you context lengh by 1 more time step...can you check if that works?

Answer 7 · 2024-05-13T13:48:42.000Z

Ohh okay did not realize that should have been [1]. That fixed it! Thank you so much!