/Intro-Image-Captioning-Lab

Lab on Intro to Image Captioning presented at GTC 2019

Primary LanguageJupyter NotebookMIT LicenseMIT

LXAI Intro Image Captioning Lab @ GTC

Header Image

In this lab we'll walkthrough an example of image captioning in Pytorch. We'll begin by downloading and training on the coco image dataset, review data augmentation with croping, rotating, flipping and resizing images. We'll then build a vocabulary for the image annotations and encode the sequences as captions. Our model for this task includes a pretrained encoder CNN to compute features on each image and the decoder RNN which handles the captioning. I'll show you the results of training and evalution from this model including automatically captioning images not included in the training set.

Slides: Image_Captioning_PyTorch.pdf

Run Locally

Local Installation

Step-by-step guides for setting up a local Anaconda environment can be found here:
workshop installation guides directory.

If you already have Pytorch installed on your local machine through an Anaconda distribution, simply clone this repository in your prefered directory, cd into the cloned directory, and type jupyter notebook to get started (it will appear in a new browser window or tab).

  • Clone and cd into this repository

git clone https://github.com/latinxinai/Intro-Image-Captioning-Lab

cd Intro-Image-Captioning

jupyter notebook

If you are not able to launch Jupyter Notebook inside your Pytorch virtual environment execute the following once inside this project directory:

pip install ipykernel python -m ipykernel install --user --name=my-virtualenv-name

Replace "my-virtualenv-name" with the name of your environment.

Get Started

If you've never used PyTorch before, begin at the "Intro to PyTorch" jupyter notebook.

If you are familiar with PyTorch, begin by following instructions to download the dataset below, then, move forward to the "Image Caption Tutorial" jupyter notebook.

Data

Get the Data

To acquire the data for this lab to run locally, you'll need to first install the coco api, then retreive the data from a Google Cloud Servers where it is hosted. We'll be using the 2014 train and val image data sets.

After the coco api has been installed, run the following commands to download the .zip files containing the image data sets.

mkdir coco wget http://msvocds.blob.core.windows.net/annotations-1-0-3/captions_train-val2014.zip -P ./coco/ wget http://images.cocodataset.org/zips/train2014.zip -P ./coco/ wget http://images.cocodataset.org/zips/val2014.zip -P ./coco/

References