ShowAndTell: A Python repository from serhii-havrylov

"For millions of years mankind lived just like the animals. Then something happened which unleashed the power of our imagination: we learned to talk".

This project reproduces the model from Show and Tell: A Neural Image Caption Generator

Image features are the outputs of the relu7 layer from the VGG network which you can download here. Remove the drop7, fc8, prob layers from .prototxt file, so the last layer must be relu7

You can download prepared training and validation data from my google drive or you can reproduce image/text feature extraction pipeline as following:

Download datasets
- MSCOCO
- Flickr8k You should send request for data receiving
- Flickr30k You should send request for data receiving
Run python scripts for generating files which store the image paths and corresponding captions
- run data_preparation/flickr/flickr8k/build_image_text_match.py
- run data_preparation/flickr/flickr30k/build_image_text_match.py
- run data_preparation/mscoco/build_image_text_match.py
Run python scripts for generating files which store image features
- run data_preparation/flickr/extract_features.py
- run data_preparation/mscoco/extract_features.py
Run python scripts for generating training and validation data
- run data_preparation/merge_all_data.py

To train model run caption_generation_model/train.py or you can download pretrained model from my google drive

If you want to use the pretrained model run minimalistic flask app caption_generation_server/app.py (Note: it requires installed caffe and its python interface pycaffe)

serhii-havrylov/ShowAndTell