lifeGWT/awesome-image-captioning

A curated list of image captioning and related area resources. :-)

Awesome Image Captioning

A curated list of image captioning and related area. :-)

Contributing

Please feel free to send me pull requests or email (zhjohnchan@gmail.com) to add links. Markdown format:

- [Paper Name](link) - Author 1 et al, `Conference Year`. [[code]](link)

Change Log

Nov.13 NeurIPS'18 and AAAI'19 papers updated!
Dec.04 More implementations updated!
Mar.04 Image captioning challenge updated!
Mar.13 CVPR'19 paper updated!
Apr.28 more CVPR'19 papers updated!

Table of Contents

Papers
- Survey
- Before - 2015 - 2016 - 2017 - 2018 - 2019
Dataset
Image Captioning Challenge
Popular Implementations
- PyTorch
- TensorFlow
- Torch
- Others

Papers

Survey

A Comprehensive Survey of Deep Learning for Image Captioning - Hossain M et al, arXiv preprint 2018.

Before

I2t: Image parsing to text description - Yao B Z et al, P IEEE 2011.
Im2Text: Describing Images Using 1 Million Captioned Photographs - Ordonez V et al, NIPS 2011. [project web]
Deep Captioning with Multimodal Recurrent Neural Networks - Mao J et al, arXiv preprint 2014.

2015

Show and Tell: A Neural Image Caption Generator - Vinyals O et al, CVPR 2015. [code] [code]
Deep Visual-Semantic Alignments for Generating Image Descriptions - Karpathy A et al, CVPR 2015. [project web] [code]
Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation - Chen X et al, CVPR 2015.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description - Donahue J et al, CVPR 2015. [code] [project web]
Guiding the Long-Short Term Memory Model for Image Caption Generation - Jia X et al, ICCV 2015.
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images - Mao J et al, ICCV 2015. [code]
Expressing an Image Stream with a Sequence of Natural Sentences - Park C C et al, NIPS 2015. [code]
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention - Xu K et al, ICML 2015. [project] [code] [code]
Order-Embeddings of Images and Language - Vendrov I et al, arXiv preprint 2015. [code]
Generating Images from Captions with Attention - Mansimov E et al, arXiv preprint 2015. [code]
Learning FRAME Models Using CNN Filters for Knowledge Visualization - Lu Y, et al, arXiv preprint 2015. [code]
Aligning where to see and what to tell: image caption with region-based attention and scene factorization - Jin J et al, arXiv preprint 2015.

2016

Image captioning with semantic attention - You Q et al, CVPR 2016. [code]
DenseCap: Fully Convolutional Localization Networks for Dense Captioning - Johnson J et al, CVPR 2016. [code]
What value do explicit high level concepts have in vision to language problems? - Wu Q et al, CVPR 2016.
SPICE: Semantic Propositional Image Caption Evaluation - Anderson P et al, ECCV 2016. [code]
Image Captioning with Deep Bidirectional LSTMs - Wang C et al, ACMMM 2016. [code]
Multimodal Pivots for Image Caption Translation - Hitschler J et al, ACL 2016.
Image Caption Generation with Text-Conditional Semantic Attention - Zhou L et al, arXiv preprint 2016. [code]
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams - Fan C et al, arXiv preprint 2016.
Learning to generalize to new compositions in image understanding - Atzmon Y et al, arXiv preprint 2016.
Generating captions without looking beyond objects - Heuer H et al, arXiv preprint 2016.
Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning - Chen W et al, arXiv preprint 2016. [code]
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering - Liu H et al, arXiv preprint 2016.
Recurrent Highway Networks with Language CNN for Image Captioning - Gu J et al, arXiv preprint 2016.

2017

Captioning Images with Diverse Objects - Venugopalan S et al, CVPR 2017. [code]
Top-down Visual Saliency Guided by Captions - Ramanishka V et al, CVPR 2017. [code]
Self-Critical Sequence Training for Image Captioning - Steven J et al, CVPR 2017. [code]
Dense Captioning with Joint Inference and Visual Context - Yang L et al, CVPR 2017. [code]
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition - Yufei W et al, CVPR 2017. [code]
A Hierarchical Approach for Generating Descriptive Image Paragraphs - Krause J et al, CVPR 2017. [code]
Deep Reinforcement Learning-based Image Captioning with Embedding Reward - Ren Z et al, CVPR 2017.
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects - Ting Y et al, CVPR 2017.
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning - Lu J et al, CVPR 2017. [code]
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks - CC Park et al, CVPR 2017. [code]
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning - Chen L et al, CVPR 2017. [code]
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning - Qing S et al, CVPR 2017.
Areas of Attention for Image Captioning - Pedersoli M et al, ICCV 2017.
Boosting Image Captioning with Attributes - Yao T et al, ICCV 2017.
An Empirical Study of Language CNN for Image Captioning - Gu J et al, ICCV 2017.
Improved Image Captioning via Policy Gradient Optimization of SPIDEr - Liu S et al, ICCV 2017.
Towards Diverse and Natural Image Descriptions via a Conditional GAN - Dai B et al, ICCV 2017. [code]
Paying Attention to Descriptions Generated by Image Captioning Models - Tavakoliy H R et al, ICCV 2017.
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner - Chen T H et al, ICCV 2017. [code]
Image Caption with Global-Local Attention - Li L et al, AAAI 2017.
Reference Based LSTM for Image Captioning - Chen M et al, AAAI 2017.
Attention Correctness in Neural Image Captioning - Liu C et al, AAAI 2017.
Text-guided Attention Model for Image Captioning - Mun J et al, AAAI 2017. [code]
Contrastive Learning for Image Captioning - Dai B et al, NIPS 2017. [code]
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge - Vinyals O et al, TPAMI 2017. [code]
MAT: A Multimodal Attentive Translator for Image Captioning - Liu C et al, arXiv preprint 2017.
Actor-Critic Sequence Training for Image Captioning - Zhang L et al, arXiv preprint 2017.
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? - Tanti M et al, arXiv preprint 2017. [code]
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning - Xian Y et al, arXiv preprint 2017.
Phrase-based Image Captioning with Hierarchical LSTM Model - Tan Y H et al, arXiv preprint 2017.
Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning - Chen H et al, arXiv preprint 2017.

2018

Neural Baby Talk - Lu J et al, CVPR 2018. [code]
Convolutional Image Captioning - Aneja J et al, CVPR 2018.
Learning to Evaluate Image Captioning - Cui Y et al, CVPR 2018. [code]
Discriminability Objective for Training Descriptive Captions - Luo R et al, CVPR 2018. [code]
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text - Mathews A et al, CVPR 2018.
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Anderson P et al, CVPR 2018. [code]
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints - Chen F et al, CVPR 2018.
Unpaired Image Captioning by Language Pivoting - Gu J et al, ECCV 2018.
Recurrent Fusion Network for Image Captioning - Jiang W et al, ECCV 2018.
Rethinking the Form of Latent States in Image Captioning - Dai B et al, ECCV 2018. [code]
Learning to Guide Decoding for Image Captioning - Jiang W et al, AAAI 2018.
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning - Gu J et al, AAAI 2018. [code]
Temporal-difference Learning with Sampling Baseline for Image Captioning - Chen H et al, AAAI 2018.
Partially-Supervised Image Captioning - Anderson P et al, NeurIPS 2018.
A Neural Compositional Paradigm for Image Captioning - Dai B et al, NeurIPS 2018.
Defoiling Foiled Image Captions - Wang J et al, NAACL 2018.
Punny Captions: Witty Wordplay in Image Descriptions - Chandrasekaran A et al, NAACL 2018. [code]
Object Counts! Bringing Explicit Detections Back into Image Captioning - Aneja J et al, NAACL 2018.
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Sharma P et al, ACL 2018. [code]
Attacking visual language grounding with adversarial examples: A case study on neural image captioning - Chen H et al, ACL 2018. [code]
Improved Image Captioning with Adversarial Semantic Alignment - Melnyk I et al, arXiv preprint 2018.
Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al, arXiv preprint 2018.
CNN+CNN: Convolutional Decoders for Image Captioning - Wang Q et al, arXiv preprint 2018.
Diverse and Controllable Image Captioning with Part-of-Speech Guidance - Deshpande A et al, arXiv preprint 2018.

2019

Unsupervised Image Captioning - Yang F et al, CVPR 2019. [code]
Engaging Image Captioning Via Personality - Shuster K et al, CVPR 2019.
Pointing Novel Objects in Image Captioning - Li Y et al, CVPR 2019.
Context and Attribute Grounded Dense Captioning - Yin G et al, CVPR 2019.
Auto-Encoding Scene Graphs for Image Captioning - Yang X et al, CVPR 2019.
Self-critical n-step Training for Image Captioning - Gao J et al, CVPR 2019.
Intention Oriented Image Captions with Guiding Objects - Zheng Y et al, CVPR 2019.
Describing like humans: on diversity in image captioning - Wang Q et al, CVPR 2019.
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection - Zhang L et al, CVPR 2019. [code]
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech - Aditya D et al, CVPR 2019.
Good News, Everyone! Context driven entity-aware captioning for news images - Biten A F et al, CVPR 2019.
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning - Kim D et al, CVPR 2019.
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions - Cornia M et al, CVPR 2019. [code]
Meta Learning for Image Captioning - Li N et al, AAAI 2019.
Learning Object Context for Dense Captioning - Li X et al, AAAI 2019.
Hierarchical Attention Network for Image Captioning - Wang W et al, AAAI 2019.
Deliberate Residual based Attention Network for Image Captioning - Gao L et al, AAAI 2019.
Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al, AAAI 2019.
Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding - Song L et al, AAAI 2019.

Dataset

MS COCO, LANG: English.
Flickr 8k, LANG: English.
Flickr 30k, LANG: English.
AI Challenger, LANG: Chinese.
Visual Genome, LANG: English.
SBUCaptionedPhotoDataset, LANG: English.
IAPR TC-12, LANG: English, German and Spanish.

Image Captioning Challenge

Popular Implementations

PyTorch

TensorFlow

Torch

Others

Licenses

To the extent possible under law, Zhihong Chen has waived all copyright and related or neighboring rights to this work.