klauscc/TALLFormer

Composition of Temporal Consistency Module

Closed this issue · 1 comments

Hi,
I have a doubt when reading the code, specifically in the config file, when defining the model structure, isn't much clear what are the components that make up the TCM.
Reading the paper along the code, I believe that the TCM consist of the modules: SRMSwin and Transformer1DRelPos.
But I am not sure whether the SelfAttnTDM module is also part of it or not.
Could you shed some light on this doubt?
Thanks you!

Hi @SimoLoca , sorry for the late. The TCM is Transformer1DRelPos that contails a few (i.e. 3) Transformer layers added on top of the pooled backbone features. SRMSwin is a pooler layer that pools BxTxCxHxW features to BxTxC'. SelfAttnTDM is not part of TCM and aims to produce multi-scale temporal features, i.e. convert BxTxC' to [BxT/2xC'', BxT/4xC''', ....].
In my previous experiments, The TCM will bring about 1-2% mAP improvement on Thumos while increasing more parameters to SelfAttnTDM won't.