
Subjective Image Captioning using Capsule Generative Adversarial Network

Primary LanguagePythonMIT LicenseMIT



We pretrain our models using Microsoft COCO Dataset. Then, we train the models using SentiCap Dataset.


  1. python 3.7.4
  2. numpy 1.18.1
  3. hickle 3.4.6
  4. scikit-image 0.16.2
  5. tensorflow 1.14 or tensorflow-gpu 1.14
  6. tqdm 4.44.1
  7. torch 1.4.0
  8. matplotlib 3.1.3


  • COCO Dataset loader and build pre-processing engine
  • Build LSTM Generator
  • Incorporate emotions into the Generator
  • Generator Logger
  • Build Conventional Discriminator
  • Discriminator Logger
  • GAN train engine
  • Validation engines
  • Record examples of generated captions in GAN structure
  • SentiCap Dataset loader and build pre-processing engine
  • Build CapsNet Discriminator
  • Inference engine
  • Train and evaluate
  • Plots


  1. Run ./download.sh and go to step 4, otherwise go to step 2.
  2. Download Microsoft COCO Dataset including neutral image caption data: images: 2014 Train images [83K/13GB] (download), 2014 Val images [41K/6GB] (download), 2014 Test images [41K/6GB] (download), captions: 2014 Train/Val annotations [241MB] (download) and extract them to the folder data/images.
  3. Download SentiCap Dataset including sentiment-bearing image caption data: captions (download) and only extract the file data/senticap_dataset.json to data/annotations.
  4. Download the VGG network used for feature extraction download and move it to the folder data/
  5. Run python resize.py --input_folder_dir ./data/images/train2014/ --output_folder_dir ./data/images/train2014_resized/ && python resize.py --input_folder_dir ./data/images/val2014/ --output_folder_dir ./data/images/val2014_resized/ (reseizes the downloded images into [224, 224] and puts them in data/images).
  6. Run python prepro.py --coco_dataset_portions 1. 0.8 0.2 --senticap_dataset_portions 0.8 0.19 0.01, where the first second and third entries are the split portion from the original dataset.
  7. Run python train.py --gen_train --gen_save_model_dir ./model/generator/ --gen_dataset coco --batchsize 8 --gen_epochs 10 to pretrain the generator.
  8. Run python train.py --disc_train --disc_network capsnet --gen_load_model_dir ./model/generator/ --disc_save_model_dir ./model/discriminator/ --disc_dataset coco --batchsize 8 --disc_epochs 10 to pretrain the discriminator.
  9. Run python train.py --gan_train --disc_network capsnet --gen_load_model_dir ./model/generator/ --disc_load_model_dir ./model/discriminator/ --gan_save_model_dir ./model/gan/ --gan_dataset senticap --batchsize 8 --gan_epochs 10 to train the GAN. You can add the arguments --gen_load_model_dir and/or --disc_load_model_dir to initialize your model with a pretrained generator and/or discriminator.


  1. Run python inference.py --word_to_idx_dir data/word_to_idx.pkl --image "test.jpg" --load_model_dir model/gan/ to describe an image.

