In iTransformer the timeseries is projected and splited to heads through variates or through time dimension ?
Closed this issue · 1 comments
Great paper and a simple idea that actually turned to be great !
But I'm trying to understand one thing - do you project the time series in self-attention along the time dimension and split into heads along time dimension ? Like here ?
Then you perform self attention along time-patch i for each head ?
Then for a single head along variates, for a time-patch
Yes, we use multi-head in the attention model. However, the time representations, which are split into heads, have been mixed at the beginning embedding, so it does not explicitly keep the order of time patch.