A picture is worth a thousand (coherent) words

Implementation of CNN-RNN architecture for image caption generation proposed in this paper.

Getting started

git clone https://github.com/mmilunovic/a-picture-is-a-thousand-words.git
pip install -r requirements.txt

Usage

apply_model_to_image_raw_bytes(open("test-image.jpg", "rb").read())

Training the model yourself

If you want to train the model by yourself, you'll need to download training and validation datasets and place them in the train_data and test_data directories:

Train images
Validation images
Captions for both train and validation

References

Show And Tell Paper - Original paper
Advanced Machine Learning Course - Final project for this course