Summary of papers and projects for visual dialog, video dialog, and multimodal dialog
TCSVT 2023
Heterogeneous Knowledge Network for Visual Dialog link
CVPR 2022
UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog linkArXiv 2022
Modeling Coreference Relations in Visual Dialog linkICASSP 2022
Improving Cross-Modal Understanding in Visual Dialog Via Contrastive Learning linkInformation Processing & Management 2022
HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog linkPattern Recognition 2022
VD-PCR: Improving visual dialog with pronoun coreference resolution link
CVPR 2019
Recursive Visual Attention in Visual Dialog link
CVPR 2017
Visual Dialog link
EMNLP 2022 Findings
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation linkNAACL 2022
VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems linkEMNLP 2022
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue linkAAAI 2022 Workshop
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations linkICIP 2022
Video-Grounded Dialogues with Joint Video and Image Training linkECCV 2022
Video Dialog as Conversation about Objects Living in Space-Time link code
TASLP 2021
End-to-End Recurrent Cross-Modality Attention for Video Dialogue linkTASLP 2021
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog linkAAAI 2021
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers linkAAAI 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue linkICLR 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues link
TCSVT 2020
Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks linkNAACL 2020
Video-Grounded Dialogues with Pretrained Generation Language Models linkEMNLP 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues link code
ACL 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems link codeICASSP 2019
End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features link
ICASSP 2022
A Non-Hierarchical Attention Network with Modality Dropout for Textual Response Generation in Multimodal Dialogue Systems linkArXiv 2022
Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model link
SIGIR 2021
MMConv: An Environment for Multimodal Conversational Search across Multiple Domains linkACMMM 2021
Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding link
ACMMM 2020
Multimodal Dialogue Systems via Capturing Context-aware Dependencies of Semantic Elements link