Two-Stage Cross-Modal Encoding Transformer Network (TS-CMETN) for Dense Video Caption.
Primary LanguagePythonMIT LicenseMIT