Learning Spatiotemporal Features via Video and Text Pair Discrimination
Primary LanguagePythonMIT LicenseMIT