Question about the video emotion recognition

Question

Question about the video emotion recognition

Opened this issue 8 months ago · 1 comments

jby1993 commented 8 months ago

Hi, thanks for releasing the code! I want to use the video emotion recognition network, and I found a question in its used module TransformerEncoder. It seems that the newly computed encoded_feature have overwritten the encoded_feature previously calculated using the alibi mask. This does not correspond to the description in the paper.

I also wanted to ask, how long do you usually set the sequence length T when using it?

Answer 1 · 2024-11-11T13:02:22.000Z

Can you be a bit more specific with your question? What lines of code are you referring to?

Thanks to alibi, the transformer should be fairly robust to varying T. In training we set T=150 (i.e up to 6s). Most MEAD videos are shorter than that, though.