/awesome-diverse-captioning

Some papers about *diverse* image (a few videos) captioning

Awesome-Diverse-Captioning

A curated list of diverse image (mainly, sometimes video, and even textual) captioning. Note that broadly, visual diverse captioning includes diverse caption set (one to many) and distinctive caption (for one single caption) with/without explicit controllable signs. Dense video captioning is excluded since it has become a subarea of video captioning. More detailed tags will be updated later. Feel free to inform me if you have any comment.

Paper List

2022

  1. A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

    Shashi Narayan, Gonçalo Simões, Yao Zhao, Joshua Maynez, Dipanjan Das, Michael Collins, Mirella Lapata (Google)

    ACL 2022 [partial code]

    conditional metrics decoding sampling

  2. Hierarchical Sketch Induction for Paraphrase Generation

    Tom Hosking, Hao Tang, Mirella Lapata

    ACL 2022

    controllable VAEs

  3. Generating Scientific Definitions with Controllable Complexity

    Tal August, Katharina Reinecke, Noah A. Smith

    ACL 2022

    controllable definition modeling

  4. CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation

    Pei Ke, Hao Zhou, Yankai Lin, Peng Li, Jie Zhou, Xiaoyan Zhu, Minlie Huang

    ACL 2022

    controllable metric

  5. Controllable Dictionary Example Generation: Generating Example Sentences for Specific Targeted Audiences

    Xingwei He, Siu Ming Yiu

    ACL 2022

    controllable

  6. Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training

    Zhu Liu, Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ke Lu

    TMM 2022

    diversity metric

2021

  1. Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles

    Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu

    CVPR 2021

    controllable

  2. Towards Accurate Text-Based Image Captioning With Content Diversity Exploration

    Guanghui Xu, Shuaicheng Niu, Mingkui Tan, Yucheng Luo, Qing Du, Qi Wu

    CVPR 2021

  3. Open-Book Video Captioning With Retrieve-Copy-Generate Network

    Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu

    CVPR 2021

  4. Question-controlled Text-aware Image Captioning

    Anwen Hu, Shizhe Chen, Qin Jin

    ACMMM 2021

    controllable

  5. O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning (Short)

    Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Yuexian Zou, Xu Sun

    ACL 2021

    controllable

  6. Control Image Captioning Spatially and Temporally

    Kun Yan, Lei Ji, Huaishao Luok, Ming Zhou, Nan Duan, Shuai Ma

    ACL 2021

    controllable (mouse traces)

  7. Understanding Guided Image Captioning Performance across Domains

    Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut

    CoNLL 2021

    controllable (semantic label)

  8. Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning

    Jiahe Shi, Yali Li, Shengjin Wang

    ICCV 2021

    controllable

2020

  1. LNFMM: Latent Normalizing Flows for Many-to-Many Cross Domain Mappings

    Shweta Mahajan, Iryna Gurevych, Stefan Roth

    ICLR 2020 [pytorch-code] [openreview]

  2. Diverse Image Captioning with Context-Object Split Latent Spaces

    Shweta Mahajan and Stefan Roth

    NIPS 2020 [pytorch-code] [review]

    diversity

  3. On Diversity in Image Captioning: Metrics and Methods

    Qingzhong Wang and Jia Wan and Antoni B. Chan

    TPAMI 2020 [pytorch-code]

    survey diversity metrics

  4. Improving Image Captioning Evaluation by Considering Inter References Variance

    Yanzhi Yi and Hangyu Deng and Jinglu Hu

    ACL 2020 [code]

    metrics

  5. Better Captioning with Sequence-Level Exploration

    Jia Chen and Qin Jin

    CVPR 2020 [video]

    diversity

2019

  1. POS: Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech

    Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander Schwing, David Forsyth

    CVPR 2019.

    diversity controllable

  2. Generating Diverse and Descriptive Image Captions Using Visual Paraphrases

    Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo

    ICCV 2019

    descriptiveness

  3. Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

    Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu

    ICCV 2019 [pytorch-code]

    controllable

  4. VSSI-cap: Variational Structured Semantic Inference for Diverse Image Captioning

    Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang

    NIPS 2019

    diversity VAE discriminativeness

  5. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

    Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

    CVPR 2019 [pytorch-code]

    controllable

  6. Intention Oriented Image Captions with Guiding Objects

    Yue Zheng, Yali Li and Shengjin Wang

    CVPR 2019 [unfinishe-code]

    controllable (object labels)

  7. Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

    Wang, Qingzhong and Chan, Antoni B

    Arxiv 2019 [pytorch-code]

  8. Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation

    Yadan Luo, Zi Huang, Zheng Zhang, Ziwei Wang, Jingjing Li, Yang Yang

    ACM MM 2019

  9. Engaging Image Captioning via Personality

    Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston

    CVPR 2019 [Openreview for ICLR 19]

  10. Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

    Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

    ICCV 2019

    diversity VAE

  11. Describing Like Humans: On Diversity in Image Captioning

    Qingzhong Wang, Antoni B. Chan

    CVPR 2019

  12. MSCap: Multi-Style Image Captioning With Unpaired Stylized Text

    Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, Hanqing Lu

    CVPR 2019

2018

  1. GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints

    Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su

    CVPR 2018

  2. A Neural Compositional Paradigm for Image Captioning

    Bo Dai, Sanja Fidler, Dahua Lin

    NIPS 2018 [lua-code] [open review]

  3. Diverse and Coherent Paragraph Generation from Images

    Moitreya Chatterjee and Alexander G. Schwing

    ECCV 2018 [pytorch-code]

  4. Categorizing Concepts With Basic Level for Vision-to-Language

    Hanzhang Wang, Hanli Wang, Kaisheng Xu

    CVPR 2018

2017

  1. Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

    Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele

    ICCV 2017

    diversity GAN

  2. Towards Diverse and Natural Image Descriptions via a Conditional GAN

    Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin

    ICCV 2017 GAN [video]

  3. Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

    Liwei Wang, Alexander Schwing, Svetlana Lazebnik

    NeurIPS 2017 [Review]

    diversity VAE

  4. Weakly Supervised Dense Video Captioning

    Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue

    CVPR 2017 VAE

  5. From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning

    Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, Heng Tao Shen

    IEEE Trans Neural Netw Learn Syst 2017 VAE

2016

  1. Diverse Image Captioning via GroupTalk

    Zhuhao Wang, Fei Wu, Weiming Lu, Jun Xiao, Xi Li, Zitong Zhang, Yueting Zhuang

    IJCAI 2016

  2. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

    Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, Dhruv Batra

    CoRR 2016 [lua-code] [demo] [openreview from ICLR'17]

2015

  1. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

    ICLR 2015 [code:TF-mRNN] [code:mRNN-CR]

    diversity consensus re-ranking

Main Reference

https://openaccess.thecvf.com/menu

https://openreview.net/