Exploration-Lab/COGMEN

[FATAL] A question about feature dimention and num_head for CMU-MOSEI

sailist opened this issue · 0 comments

The total feature dimention of CMU-MOSEI in your paper is 883(768 for text, 80 for audio and 35 for visual). Concated features will first pass transformer in your code(As named SeqContext). The num_heads range of this transformer in your code is [7, 15].

But, 883 is a prime number.

nn.TransformerEncoderLayer(
    d_model=883,
    nhead=14,
)

AssertionError: embed_dim must be divisible by num_heads