The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
yhd-123 doesn’t have any repository yet.