This repository is based on sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning.
This model combines attention-based model proposed in Show, Attend, and Tell with language pretrained techniques, such as BERT, which wanna incorporate language knowledge into caption task.
pytorch 1.3.1, python3.6, nltk3.4.5