Image Captioning: CNN + RNN (Computer Vision + Natural Language Processing)
Lingfeng Zhang(University of Ottawa)
Yu Sun(Carleton University)
Contents: CNN encoders(VGG16,InceptionV3,MobileNet,ResNet) by transfer learning, RNN decoders(stacked LSTM, GRU with attention mechanism), Evaluation Metrics(BLEU,CIDEr,METEOR), Datasets(Flickr8k, COCO), Django Web Application with the best performance model.
You need download Flickr8k in person. In addtion, you need run ./Image_captioning_with_visual_attention/Image_captioning_with_visual_attention.py first to get annotation files and image datasets.
Run ./Comparison/Comparison.py to get the final results.
For application, please visit: https://github.com/RichardChangCA/Image_Captioning
-
Attention in Neural Networks, image captioning: https://www.youtube.com/watch?v=W2rWgXJBZhU
-
C5W3L07 Attention Model Intuition, Andrew Ng deeplearning.ai: https://www.youtube.com/watch?v=SysgYptB198
-
C5W3L08 Attention Model, Andrew Ng deeplearning.ai: https://www.youtube.com/watch?v=quoGRI-1l0A
-
Image Captioning with keras: https://towardsdatascience.com/image-captioning-with-keras-teaching-computers-to-describe-pictures-c88a46a311b8
-
Preparing Image Captioning Dataset: https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
-
Image Captioning with Visual Attention: https://www.tensorflow.org/tutorials/text/image_captioning
-
GoogLeNet(Inception_v1):https://medium.com/coinmonks/paper-review-of-googlenet-inception-v1-winner-of-ilsvlc-2014-image-classification-c2b3565a64e7
-
Inception_v3:https://medium.com/@sh.tsang/review-inception-v3-1st-runner-up-image-classification-in-ilsvrc-2015-17915421f77c
-
BLEU score: https://en.wikipedia.org/wiki/BLEU
-
BLEU and METEOR: https://medium.com/explorations-in-language-and-learning/metrics-for-nlg-evaluation-c89b6a781054
-
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention: https://arxiv.org/pdf/1502.03044.pdf