Question about the video emotion recognition
Opened this issue · 1 comments
jby1993 commented
Hi, thanks for releasing the code! I want to use the video emotion recognition network, and I found a question in its used module TransformerEncoder. It seems that the newly computed encoded_feature have overwritten the encoded_feature previously calculated using the alibi mask. This does not correspond to the description in the paper.
I also wanted to ask, how long do you usually set the sequence length T when using it?
radekd91 commented
Can you be a bit more specific with your question? What lines of code are you referring to?
Thanks to alibi, the transformer should be fairly robust to varying T. In training we set T=150 (i.e up to 6s). Most MEAD videos are shorter than that, though.