simon-ging/coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

PythonApache-2.0