linear fusion model
clytze0216 opened this issue · 1 comments
clytze0216 commented
Thank you for sharing.I would like to ask if you have tried to change the linear multimodal fusion model, does it affect the accuracy?Looking forward to your reply. Thanks a lot!!
MIL-VLG commented
We have tested some other models like eltwise-prod, concat, and bilinear pooling models. However, these do not bring any improvements. We think the reason could be that the multimodal fusion has been conducted implicitly in the deep co-attention learning stage.
Since the modification is minor, you can try it by yourself, if any new results are found, we will appreciate it if you can tell us.