Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

Pytorch implementation for Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. The goal is to introduce a lightweight generative adversarial network for efficient image manipulation using natural language descriptions.

Overview

Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation.
Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz.
University of Oxford, University of Hong Kong
NeurIPS 2020

Data

  1. Download the preprocessed metadata for bird and coco, and save both into data/
  2. Download bird dataset and extract the images to data/birds/
  3. Download coco dataset and extract the images to data/coco/

Training

All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.

DAMSM model includes a text encoder and an image encoder

  • Pre-train DAMSM model for bird dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
  • Pre-train DAMSM model for coco dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 1

Our Model

  • Train the model for bird dataset:
python main.py --cfg cfg/train_bird.yml --gpu 2
  • Train the model for coco dataset:
python main.py --cfg cfg/train_coco.yml --gpu 3

*.yml files include configuration for training and testing. To reduce the number of parameters used in the model, please edit DF_DIM and/or GF_DIM values in the corresponding *.yml files.

Pretrained DAMSM Model

Pretrained Lightweight Model

  • Bird. Download and save it to models/

  • COCO. Download and save it to models/

Testing

  • Test our model on bird dataset:
python main.py --cfg cfg/eval_bird.yml --gpu 4
  • Test our model on coco dataset:
python main.py --cfg cfg/eval_coco.yml --gpu 5

Evaluation

  • To generate images for all captions in the testing dataset, change B_VALIDATION to True in the eval_*.yml.
  • Fréchet Inception Distance.

Code Structure

  • code/main.py: the entry point for training and testing.
  • code/trainer.py: creates the networks, harnesses and reports the progress of training.
  • code/model.py: defines the architecture.
  • code/attention.py: defines the spatial and channel-wise attentions.
  • code/VGGFeatureLoss.py: defines the architecture of the VGG-16.
  • code/datasets.py: defines the class for loading images and captions.
  • code/pretrain_DAMSM.py: trains the text and image encoders, harnesses and reports the progress of training.
  • code/miscc/losses.py: defines and computes the losses.
  • code/miscc/config.py: creates the option list.
  • code/miscc/utils.py: additional functions.

Citation

If you find this useful for your research, please use the following.

@article{li2020lightweight,
  title={Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation},
  author={Li, Bowen and Qi, Xiaojuan and Torr, Philip and Lukasiewicz, Thomas},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Acknowledgements

This code borrows heavily from ManiGAN and ControlGAN repositories. Many thanks.