Question about per-token influence for time series transformer models
Closed this issue · 3 comments
Thank you for open sourcing your amazing work.
I tried to compute per-token influences using Hugging Face's patchTST model. However, I encountered the following error:
RuntimeError: The model does not support token-wise score computation. Set
compute_per_module_scores=True
orcompute_per_token_scores=False
to avoid this error.
This is triggered by the DIMENSION_NOT_MATCH_ERROR_MSG
in this file. However, it does not seem to be a dimensionality issue. I was wondering if it is because we don't need to use a tokenizer for time series.
My question is: Does the per-token influence apply exclusively to language transformer models? How can I make it work for time series transformer models like PatchTST? Thank you.
Thank you for reporting the issue! Have you tried setting compute_per_module_scores = True
? While per-token influence computation can be applied to other transformer models (e.g., binary classification), enabling this flag in such cases is important. This is because some modules have a token dimension while others do not, and aggregating all scores causes a dimensionality mismatch issue (e.g., adding matrices of dimensionquery_size x train_size x token_size
and query_size x train_size
). If this also causes an error, it would be great if you could share a small toy code to reproduce this; I can take a look at it later.
This has solved my issue. Thank you!
That is great to hear! You can optimize the code a bit by specifying only the modules that have the same token dimension in Task
and turning off compute_per_module_scores
(especially if the context length is large). But please feel free to open the issue if the speed becomes a bottleneck.