object_captioning: A Python repository from nqanh

Object Captioning and Retrieval with Natural Language

By Anh Nguyen, Quang D. Tran, Thanh-Toan Do, Ian Reid, Darwin G. Caldwell, Nikos G. Tsagarakis

Requirements
Quick Demo
Training

Requirements

Tensorflow (version > 1.0)
Hardware
- A gpu with ~6GB

Quick Demo

Clone the repo to your $PROJECT_PATH folder
Download pretrained weight from this link, and put it under your $PROJECT_PATH\trained_weight folder
Download the Flickr5k dataset, and put it under your $PROJECT_PATH\data\VOCdevkit2007 folder
Change the project path in file lib/model/config.py: __C.root_folder_path = '$PROJECT_PATH'
Build the lib module: cd $PROJECT_PATH/lib then make
Run the demo: cd $PROJECT_PATH/tool then python demo_caption.py to generate captions for your images

Training

We train the network on Flickr5k dataset
- We need to format Flickr5k dataset as in Pascal-VOC dataset for training.
- For your convinience, we did it for you. Just download this file (Google Drive and extract it into your $PROJECT_PATH\data\VOCdevkit2007 folder.
Train the network:
- python $PROJECT_PATH/tool/trainval_net.py

If you find this source code useful in your research, please consider citing:

@article{Nguyen_objcaption,
  author    = {Anh Nguyen and
			   Duy Q. Tran and
			   Thanh{-}Toan Do and
			   Ian D. Reid and
			   Darwin G. Caldwell and
			   Nikos G.Tsagarakis},
  title     = {Object Captioning and Retrieval with Natural Language},
  journal   = {International Conference on Computer Vision Workshop},
  year      = {2019},
}

License

MIT License

Acknowledgement

This repo used a lot of source code from Faster-RCNN and AffordanceNet

Contact

If you have any questions or comments, please send an email to: a.nguyen@ic.ac.uk

nqanh/object_captioning