Video-Language Alignment via Spatio–Temporal Graph Transformer; ArXiv: https://arxiv.org/abs/2407.11677
Primary LanguagePython
No issues in this repository yet.