/multimodal_vtt

Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

Primary LanguagePython

Watchers