Project realized during my master thesis in Computer Science at UMONS (University of Mons, Belgium).
Director : Jean-Benoit DELBROUCK
Co-director : Dr Stéphane DUPONT
Reporters : Dr Hadrien MELOT & Adrien COPPENS
Our model is available here.
Our environment uses Python 3.6.8 with Anaconda. All the libraries and requirements can be installed by running :
$ pip install -r reqs
We use MS COCO as data set. We propose two methods to format it for training our models :
- Via script : prepare the data set by running our formatting script.
- Via downloading : download the ready-to-use version of the data set. These procedures allow you to obtain the same data set that the one we used. You can adapt it for your own needs.
Method 1 - Via script
- Create a folder
cocodataset/images
. - Download images from 2014 release into
cocodataset/images
(download both 2014 Train images and 2014 Val images). - (For Windows users) Download the MS COCO annotations zip file into the root of the projet.
- Run the following commands :
python prepare_coco.py
python vocab.py cocodataset/captions/train.en coco_vocab.en 4000
- Move the generated
coco_vocab.en
file into thecocodataset
folder. - Create a folder
cocodataset/embeddings
. - Download the GloVe embeddings 6B.300d into the
cocodataset/embeddings
.
Method 2 - Via downloading
- Download the data set here.
- Extract the content of the zip file into the root of the project.
All the configuration parameters of this project are in the config.json
file. You can create your own copy of the configuration or modify this one.
Note : the default parameters are the ones that gave us the best results.
The load_dict
parameter is used to test a specific model. The default one is the one correspind to our best results. This model is available here.
The model can be selected by uncommented the correct line in main.py
:
- WGANBase is the basic WGAN
- WGANGP is the WGAN with gradient penalty
- WGANLip is the WGAN with alternative gradient penalty
- WGAN is the WGAN with gradient penalty and clipping
- RelativisticGAN is the RGAN
To train the model, execute the following command:
$ python main.py config.json
To test the model, execute the following command:
$ python test.py config.json
This script will create two output files :
output_argmax
: contains the generated captions obtained by selecting arg max words.output_beam
: contains the generated captions obtained by using beam search.
To get Top 5 and Flop 5 for one of these files, execute the following command :
$ python evaluator.py cocodataset/captions/beam.en output_argmax cocodataset/links/beam.txt cocodatset/images
That will generate 10 PNG files containing these results.
- As explained here, reproducibility is not guaranteed across PyTorch releases, individual commits or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
- Training data set is limited to 50.000 instances. Removing this limit is possible by modifying :
- the
datasets/caption.py
:
if mode == 'train': captions = captions[:50000]
- the
datasets/numpy.py
:
if mode == 'train': self.data = self.data[:50000]
- the
datasets/text.py
:
if mode == 'train': self.data = self.data[:50000]
- the