radekd91/inferno

Question about the video emotion recognition

Opened this issue · 1 comments

Hi, thanks for releasing the code! I want to use the video emotion recognition network, and I found a question in its used module TransformerEncoder. It seems that the newly computed encoded_feature have overwritten the encoded_feature previously calculated using the alibi mask. This does not correspond to the description in the paper.

I also wanted to ask, how long do you usually set the sequence length T when using it?

Can you be a bit more specific with your question? What lines of code are you referring to?

Thanks to alibi, the transformer should be fairly robust to varying T. In training we set T=150 (i.e up to 6s). Most MEAD videos are shorter than that, though.