Temporal Language Grounding

Introduction

Task：

given a query, find the corresponding moment in a given video. (major focus of this repo)

Format

Markdown format:

- [Paper Name](link) - Author 1 et al, `Conference Year`. [[code]](link)

Change Log

2020/07/27 start the repo.
Papers before 2020 are mainly collected by muketong.

to be updated ...

Keywords used in searching

grounding, retrieval, localization

Papers

Survey

None.

2019

Supervised:

MAC: Mining Activity Concepts for Language-based Temporal Localization - Runzhou Ge Ge et al, WACV 2019. [code]
Multilevel Language and Vision Integration for Text-to-Clip Retrieval - H. Xu et al, AAAI 2019. [code]
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos - He, Dongliang et al, AAAI 2019.
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression - Y. Yuan et al, AAAI 2019. [code]
Semantic Proposal for Activity Localization in Videos via Sentence Query - S. Chen et al, AAAI 2019.
Localizing natural language in videos - J. Chen et al, AAAI 2019.
ExCL: Extractive Clip Localization Using Natural Language Descriptions - S. Ghosh et al, NAACL 2019.
Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention - B. Jiang et al, ICMR 2019. [code]
Language-Driven Temporal Activity Localization_ A Semantic Matching Reinforcement Learning Model - W. Wang et al, CVPR 2019.
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment - Da Zhang et al, CVPR 2019.
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos - Zhu Zhang et al, SIGIR 2019. [code]
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos - Yitian Yuan et al, NeurIPS 2019. [code]
DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization - Chujie Lu et al, EMNLP 2019.
!(still on arxiv 20200609)Temporal Localization of Moments in Video Collections with Natural Language - V. Escorcia et al, arxiv 2019.

Weakly Supervised:

Weakly Supervised Video Moment Retrieval From Text Queries - N. C. Mithun et al, CVPR 2019.
Weakly-supervised spatio-temporally grounding natural sentence in video - Zhenfang Chen et al, ACL 2019. [code]
WSLLN: Weakly Supervised Natural Language Localization Networks - M. Gao et al, EMNLP 2019.

2020

Supervised:

Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction - Zhijie Lin et al, TIP 2020.
Rethinking the Bottom-Up Framework for Query-based Video Localization - Long Chen et al, AAAI 2020.
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction - Jingwen Wang et al, AAAI 2020. [code]
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language - Songyang Zhang et al, AAAI 2020. [code]
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video - Jie Wu et al, AAAI 2020. [code]
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention - C. R. Opazo et al, WACV 2020. [code]
Local-Global Video-Text Interactions for Temporal Grounding - Mun Jonghwan et al, CVPR 2020. [code]
Dense Regression Network for Video Grounding - Zeng Runhao et al, CVPR 2020. [code]
Tripping through time: Efficient Localization of Activities in Videos - Meera Hahn et al, BMVC 2020.
Span-based Localizing Network for Natural Language Video Localization - Hao Zhang et al, ACL 2020. [code]
Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language - Shaoxiang Chen et al, ECCV 2020. [code]
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos - Shaoxiang Chen et al, ECCV 2020.
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization - Daizong Liu et al, MM 2020. [code]
Fine-grained Iterative Attention Network for Temporal Language Localization in Videos - Xiaoye Qu et al, MM 2020.
Dual Path Interaction Network for Video Moment Localization - Hao Wang et al, MM 2020.
Adversarial Video Moment Retrieval by Jointly Modeling Ranking and Localization - et al, MM 2020. [code]
STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization - Da Cao et al, MM 2020. [code]
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos - Jie Wu et al, MM 2020.
Language Guided Networks for Cross-modal Moment Retrieval - Kun Liu et al, arxiv.

Weakly Supervised:

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network - Zhijie Lin et al, AAAI 2020.
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval - Minuk Ma et al, ECCV 2020.
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization - Yuanhao Zhai et al, ECCV 2020.
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos - Zhu Zhang et al, MM 2020. [code]
Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding - Zhang Zhu et al, NeruIPS 2020.

2021

Interaction-Integrated Network for Natural Language Moment Localization - Ke Ning et al, 'TIP 2021'.
Boundary Proposal Network for Two-Stage Natural Language Video Localization - Shaoning Xiao et al, AAAI 2021.
Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding - Liu et al, CVPR 2021.
Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval - Zeng et al, CVPR 2021.
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers - Miech et al, CVPR 2021.
Fast Video Moment Retrieval - Gao et al, ICCV 2021.
Hierarchical Deep Residual Reasoning for Temporal Moment Localization - Ma et al, arxiv.

Conferences to be update:

None

SCZwangxiao/Temporal-Language-Grounding-in-videos

Temporal Language Grounding

Introduction

Format

Change Log

Table of Contents

Keywords used in searching

Papers

Survey

Before

2017

2018

2019

2020

2021

Dataset

Licenses