Implementation of "simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions"
This code is written in Python2.7 and requires Torch 0.3
You need to download pre-trained Resnet152 model from torchvision for both training and evaluation.
You can have a look at https://github.com/s-gupta/visual-concepts to get topic words
Now we can train our simNet model with
CUDA_VISIBLE_DEVICES=1,2,3 screen python train.py
We can test our simNet model with
CUDA_VISIBLE_DEVICES=1,2,3 screen python test.py
If you have any questions about the code or our paper, please send an email to "lfl@bupt.edu.cn"
If you use this code as part of any published research, please acknowledge the following paper
@misc{Liu2018simNet,
author = {Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu Sun},
title = {simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions},
journal = {EMNLP},
year = {2018}
}
Thanks to Torch team for providing Torch 0.3, CodaLab team for providing online evaluation,COCO team and Flickr30k for providing dataset, Tsung-Yi Lin for providing evaluation codes for MS COCO caption generation, Yufeng Ma's open source repositories and Torchvision ResNet implementation.