This repository aims to collect the articles and codes for the Visual Storytelling (VIST) task. VIST is a vision-and-language task. It aims to summarize the idea of a photo stream and tells a story about it (in natural language). Be careful about its difference from the "storytelling with data", which is more related to data visualization.
More introduction and examples are shown in the site.
Google Drive archived (313GB in total). If you have no access to google drive, maybe you should contact authors engaging in this task in your area for help.
To evaluate the generated story, metrics like BLEU, CIDEr, METEOR, ROUGE are most common used.
Evaluation Metrics Implementation: vist eval.It works with vist api
The task also needs human evaluation, usually in prespective of "Relevance", "Expressiveness" and "Concreteness", taking AREL(No Metrics Are Perfect, ACL 18) as a referebce.
- Please feel free to pull requests or open an issue to add papers.
Title | Venue | Type | Code | Star |
---|---|---|---|---|
Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling | AAAI21 | KG | ||
[Imagine,Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning] | AAAI21 | KG+SG |
Title | Venue | Type | Code | Star |
---|---|---|---|---|
Knowledge-Enriched Visual Storytelling | AAAI20 | KG | Pytorch1.3 | 17+ |
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling | AAAI20 | RL | ||
Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling | AAAI20 | En\Decoder | ||
Storytelling from an Image Stream Using Scene Graphs | AAAI20 | SceneGraph | ||
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling | ACMMM20 | Few-Shot |
Title | Venue | Type | Code | Star |
---|---|---|---|---|
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling | ACL18 | RL | Pytorch 0.3 | 117+ |
Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training | AAAI18 | RL | ||
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling | NAACL18 StoryNLP workshop | BS | ||
Adversarial Learning for Visual Storytelling with Sense Group Partition | ACCV18 | RL | ||
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation | NAACL18 StoryNLP workshop | En\Decoder | Pytorch1.0 | 34+ |
Contextualize, Show and Tell: A Neural Visual Storyteller | NAACL18 StoryNLP workshop | En\Decoder | Tensorflow1.0 | 21+ |
A Pipeline for Creative Visual Storytelling | NAACL18 StoryNLP workshop | Other | ||
Stories for Images-in-Sequence by using Visualand Narrative Components | ICTI18 | En\Decoder | Tensorflow1.6 | 34+ |
Title | Venue | Type | Code | Star |
---|---|---|---|---|
Let Your Photos Talk Generating Narrative Paragraph for Photo Stream | AAAI17 | ATT | ||
Learning Deep Contextual Attention Network for Narrative Photo Stream Captioning | ACM MM17 workshop | FasterRCNN+ATT | ||
Hierarchically-Attentive RNN for Album Summarization and Storytelling | EMNLP17 | En\Decoder | Python | 26+ |
Sort Story: Sorting Jumbled Images and Captions into Stories | EMNLP16 | Other |