I've implemented Image captioning with vanilla RNNs and LSTMs on the COCO dataset, which is a deep CNN pretrained to perform image classification on ImageNet.
Aryan05/ImageCaptionGenerator
LSTM and RNN implementation on COCO dataset for Image Cation generator
Jupyter Notebook