Whether the embedings generated by different modal data has comparability?
mengfanShi opened this issue · 2 comments
just like CLIP, whether embedings generated by Universal Encoder has comparability? if can, we can perform search and matching based on the similarity of embedings for different modal data. Could you provide the Encoder part of the model separately for testing? The overall 15GB model is too large at the moment.
Well, since we didn't train the model on exact pair data, the comparability might not satisfy your expectation at this time.
Thanks for your attention.
Well, since we didn't train the model on exact pair data, the comparability might not satisfy your expectation at this time.
Thanks for your attention.
But I see you run the test on Music-AVQA in thesis, could u tell me how you manage to use three modalities to generate answers?Thank u very much!