/Udacity-CVND-Project2-Automated-Image-Captioning

This project aims at training a CNN-RNN model to predict captions for a given image. The main task is to implement an effective RNN decoder for a CNN encoder.

Primary LanguageHTMLMIT LicenseMIT

Udacity-CVND-Project2-Automated-Image-Captioning

Objective

This project aims at training a CNN-RNN model to predict captions for a given image. The main task is to implement an effective RNN decoder for a CNN encoder.

image

Project Overview

The goal of this project is to create a neural network architecture to automatically generate captions from images. Please checkout requirements.txt for the necessary packages required.

Important: Pytorch version 0.4.0 required.

The Microsoft Common Objects in COntext (MS COCO) dataset is used to train the neural network. The final model is then tested on novel images!

Project Instructions

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order:

  • 0_Dataset.ipynb
  • 1_Preliminaries.ipynb
  • 2_Training.ipynb
  • 3_Inference.ipynb and
    model.py: Network Architecture.

Network Architecture

The network architecture consists of:

  1. The CNN encoder converts images into embedded feature vectors: image
  2. The feature vector is translated into a sequence of tokens by an RNN Decoder, which is a sequential neural network made up of LSTM units: image

Results

These are some of the outputs/captions generated by the neural network on a couple of test images from test data of COCO dataset:

output1

output2