Scene Description using GPT-2

Source code and documentation for the model which generates captions for any given image.

MSCOCO dataset: n_samples = 13500

For geting dataset:

bash getData.sh

We recommend anaconda/miniconda.
You can setup your environment(x86_64) by entering:

conda env create -f environment.yml

If you want to use pip, enter following in your venv:

pip install -r requirements.txt

python train.py --p /path/to/training/data --a /path/to/annontations

If you are on Ubuntu or similar distribution juse python with python3

It also supports some more flags which you can see by entering

python train.py --help

python scene_captioning.py