[FATAL] A question about feature dimention and num_head for CMU-MOSEI
sailist opened this issue · 0 comments
sailist commented
The total feature dimention of CMU-MOSEI in your paper is 883(768 for text, 80 for audio and 35 for visual). Concated features will first pass transformer in your code(As named SeqContext
). The num_heads
range of this transformer in your code is [7, 15].
But, 883 is a prime number.
nn.TransformerEncoderLayer(
d_model=883,
nhead=14,
)
AssertionError: embed_dim must be divisible by num_heads