/CSI_5386_NLP_Project

image captioning: CNN+RNN

Primary LanguagePython

CSI_5386_NLP_Project

Image Captioning: CNN + RNN (Computer Vision + Natural Language Processing)

Lingfeng Zhang(University of Ottawa)

Yu Sun(Carleton University)

Contents: CNN encoders(VGG16,InceptionV3,MobileNet,ResNet) by transfer learning, RNN decoders(stacked LSTM, GRU with attention mechanism), Evaluation Metrics(BLEU,CIDEr,METEOR), Datasets(Flickr8k, COCO), Django Web Application with the best performance model.

You need download Flickr8k in person. In addtion, you need run ./Image_captioning_with_visual_attention/Image_captioning_with_visual_attention.py first to get annotation files and image datasets.

Run ./Comparison/Comparison.py to get the final results.

For application, please visit: https://github.com/RichardChangCA/Image_Captioning

Some useful study links:

  1. Attention in Neural Networks, image captioning: https://www.youtube.com/watch?v=W2rWgXJBZhU

  2. C5W3L07 Attention Model Intuition, Andrew Ng deeplearning.ai: https://www.youtube.com/watch?v=SysgYptB198

  3. C5W3L08 Attention Model, Andrew Ng deeplearning.ai: https://www.youtube.com/watch?v=quoGRI-1l0A

  4. MobileNet: https://towardsdatascience.com/review-mobilenetv1-depthwise-separable-convolution-light-weight-model-a382df364b69

  5. VGGNet: https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11

  6. Image Captioning with keras: https://towardsdatascience.com/image-captioning-with-keras-teaching-computers-to-describe-pictures-c88a46a311b8

  7. Preparing Image Captioning Dataset: https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/

  8. Image Captioning with Visual Attention: https://www.tensorflow.org/tutorials/text/image_captioning

  9. ResNet: https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624

  10. GoogLeNet(Inception_v1):https://medium.com/coinmonks/paper-review-of-googlenet-inception-v1-winner-of-ilsvlc-2014-image-classification-c2b3565a64e7

  11. Inception_v3:https://medium.com/@sh.tsang/review-inception-v3-1st-runner-up-image-classification-in-ilsvrc-2015-17915421f77c

  12. BLEU score: https://en.wikipedia.org/wiki/BLEU

  13. BLEU and METEOR: https://medium.com/explorations-in-language-and-learning/metrics-for-nlg-evaluation-c89b6a781054

  14. CIDEr: https://arxiv.org/pdf/1411.5726.pdf

  15. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention: https://arxiv.org/pdf/1502.03044.pdf