gan-image-captioning

Project realized during my master thesis in Computer Science at UMONS (University of Mons, Belgium).

Director : Jean-Benoit DELBROUCK

Co-director : Dr Stéphane DUPONT

Reporters : Dr Hadrien MELOT & Adrien COPPENS

Our model is available here.

Environment preparation

Our environment uses Python 3.6.8 with Anaconda. All the libraries and requirements can be installed by running :

$ pip install -r reqs

Data set preparation

We use MS COCO as data set. We propose two methods to format it for training our models :

  1. Via script : prepare the data set by running our formatting script.
  2. Via downloading : download the ready-to-use version of the data set. These procedures allow you to obtain the same data set that the one we used. You can adapt it for your own needs.

Method 1 - Via script

  1. Create a folder cocodataset/images.
  2. Download images from 2014 release into cocodataset/images (download both 2014 Train images and 2014 Val images).
  3. (For Windows users) Download the MS COCO annotations zip file into the root of the projet.
  4. Run the following commands :
python prepare_coco.py
python vocab.py cocodataset/captions/train.en coco_vocab.en 4000
  1. Move the generated coco_vocab.en file into the cocodataset folder.
  2. Create a folder cocodataset/embeddings.
  3. Download the GloVe embeddings 6B.300d into the cocodataset/embeddings.

Method 2 - Via downloading

  1. Download the data set here.
  2. Extract the content of the zip file into the root of the project.

Model configuration

All the configuration parameters of this project are in the config.json file. You can create your own copy of the configuration or modify this one.

Note : the default parameters are the ones that gave us the best results.

The load_dict parameter is used to test a specific model. The default one is the one correspind to our best results. This model is available here.

The model can be selected by uncommented the correct line in main.py :

  • WGANBase is the basic WGAN
  • WGANGP is the WGAN with gradient penalty
  • WGANLip is the WGAN with alternative gradient penalty
  • WGAN is the WGAN with gradient penalty and clipping
  • RelativisticGAN is the RGAN

Model training

To train the model, execute the following command:

$ python main.py config.json

Model test

To test the model, execute the following command:

$ python test.py config.json

This script will create two output files :

  • output_argmax : contains the generated captions obtained by selecting arg max words.
  • output_beam : contains the generated captions obtained by using beam search.

To get Top 5 and Flop 5 for one of these files, execute the following command :

$ python evaluator.py cocodataset/captions/beam.en output_argmax cocodataset/links/beam.txt cocodatset/images

That will generate 10 PNG files containing these results.

Notes

  • As explained here, reproducibility is not guaranteed across PyTorch releases, individual commits or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
  • Training data set is limited to 50.000 instances. Removing this limit is possible by modifying :
    • the datasets/caption.py :
     if mode == 'train':
         captions = captions[:50000]
    • the datasets/numpy.py :
     if mode == 'train':
         self.data = self.data[:50000]
    • the datasets/text.py :
     if mode == 'train':
         self.data = self.data[:50000]