Image Captioning Guideline

This is a curated list of Image Captioning papers, databases and codes.

Image captioning is the task of describing an image. In order to to that, one must recognize the objects, scenario, characters and theirs relationships on the figure, after that generate a sentence that represent the elements detected in a natural language way. Image captioning is a hard task, joining two different areas from Artificial Intelligence: Computer Vision and Natural Language Processing.

This repo is organized with: surveys, datasets, metrics and then by the strategies used to do Image Captioning. Starting from early proposals of description retrieval and template filling going all the way to the se of deep learning technique, starting with CNNs with RNNs to the use of Transformers for generating global representations and generate language.

Surveys

First we added some surveys to help discovering the area:

Datasets

Papers

Description Retrieval and Rule Based (early strategies)

Deep Learning strategies