Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
Primary LanguagePython