Image Caption Validation

This repository contains the code for the Spring 2018-COS 598B final project designed by Ryan McCaffrey and Yannis Karakozis. The architecture of the model is heavily inspired by the following paper:

R. Hu, M. Rohrbach, T. Darrell, Segmentation from Natural Language Expressions. in ECCV, 2016. (PDF)

@article{hu2016segmentation,
  title={Segmentation from Natural Language Expressions},
  author={Hu, Ronghang and Rohrbach, Marcus and Darrell, Trevor},
  journal={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2016}
}

A graphic of the architecture is below:

Installation

Install Google TensorFlow (v1.0.0 or higher) following the instructions here.
Download this repository or clone with Git, and then cd into the root directory of the repository.

Demo

You can request the pre-trained models by contacting the authors at {rm24,ick}@princeton.edu.
Run the model demo in ./demo/text_objseg_cls_glove_demo.ipynb with Jupyter Notebook (IPython Notebook). If the demo with the learned word embedding is desired, run the demo in ./demo/text_objseg_cls_demo.ipynb.

Training and evaluation on COCO Dataset

Download VGG network, GloVe embeddings, COCO files

Download VGG-16 network parameters trained on ImageNET 1000 classes:
models/convert_caffemodel/params/download_vgg_params.sh.
Download the pre-trained GloVe word embeddings using the chakin Github repository. Follow the repository instructions to download the glove.6B.50d.txt and glove.6B.300d.txt files, and then place both in the exp-referit/data repository.
Download the testing and training COCO annotations from the download site. Choose the 2017 Train/Val annotations zip, unpack it, and place the files in coco/annotations. Execute the coco/Makefile to setup the MS COCO Python API.
Download the ReferIt dataset: exp-referit/referit-dataset/download_referit_dataset.sh .

Training

You may need to add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Build training batches: python coco/build_training_batches_cls_coco.py. Check the file to see different arguments that can be given to change how files are generated, which will impact how the model trains.
Select the GPU you want to use during training:
export GPU_ID=<gpu id>. Use 0 for <gpu id> if you only have one GPU on your machine.
Train the caption validation model using one of the following commands:
- To train with learned word embeddings: python train_cls.py $GPU_ID
- To train with GloVe embeddings: python train_cls_glove.py $GPU_ID
- See the top of each training script for details on the other command line arguments needed.

Evaluation

Select the GPU you want to use during testing: export GPU_ID=<gpu id>. Use 0 for <gpu id> if you only have one GPU on your machine. Also, you may need to add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Run evaluation for the caption validation model: python coco/test_coco_cls.py $GPU_ID . Look inside the file to see the arguments that can be given to test the model. These arguments should match the ones given to the training batch image file. This should reproduce the results in the paper.