The meaning of modeling 'N'
Leopold2333 opened this issue · 2 comments
Hi, I notice that S-Mamba tries to feed an input tensor with shape [B N E] into Mamba block, where N is the feature number of each timestamp and E is generated from each univariate sequence by an MLP. There is anothor work TimeMachine which has a similar modeling way. However, for a standard Mamba block, it uses [B L N] as the input to modeling the sequence temporal dependencies, well, this makes me confused, what is the actual meaning of modeling [B N E]? It seems that the model tries to generate each univariate series orderly, rather than modeling the temporal evolution process of time series.
You can check the method of iTransformer to get the answer. To put it simply, we first process the evolution of each variate along time separately, and then use Mamba to mix the information contained in different variates. Instead of first fusing the variates information corresponding to each time point, and then using mamba to process the evolution of the time points. The former is proved to be better by iTransformer!
The difference is First processing Time Dependency' or `First processing inter-Variate Correlation'
You can check the method of iTransformer to get the answer. To put it simply, we first process the evolution of each variate along time separately, and then use Mamba to mix the information contained in different variates. Instead of first fusing the variates information corresponding to each time point, and then using mamba to process the evolution of the time points. The former is proved to be better by iTransformer! The difference is First processing Time Dependency' or `First processing inter-Variate Correlation'
Get it. Thank you!