/a-picture-is-a-thousand-words

Image caption generator using CNN as an encoder and RNN as an decoder.

Primary LanguagePython

A picture is worth a thousand (coherent) words

Implementation of CNN-RNN architecture for image caption generation proposed in this paper.

Google AI Blog about this problem.

Getting started

git clone https://github.com/mmilunovic/a-picture-is-a-thousand-words.git
pip install -r requirements.txt

Usage

apply_model_to_image_raw_bytes(open("test-image.jpg", "rb").read())

Training the model yourself

If you want to train the model by yourself, you'll need to download training and validation datasets and place them in the train_data and test_data directories:

References