PyTorch version of the Char-CNN-RNN
model and train/evaluation procedures,
as described in the paper "Learning Deep Representations of Fine-Grained
Visual Descriptions".
This repository also provides instructions on how to extract and use the original weights for two papers:
- Learning Deep Representations of Fine-Grained Visual Descriptions
- Generative Adversarial Text-to-Image Synthesis
This implementation requires PyTorch >= 1.1.0 and Python 3. Check
requirements.txt
file for information on other packages.
The scripts
folder contains bash scripts that reproduce the original training
and evaluation procedures.
- Install repository requirements via
pip install -r requirements.txt
- Download the datasets from the original author here.
- Run
python sje_train -h
to get instructions (or checkscript
folder). You can open TensorBoard to check live training results. - After training, run
python sje_eval -h
to get instructions for evaluation procedures.
This implementation currently only accepts the original model weights for the birds and flowers datasets.
- Download the pretrained Char-CNN-RNN models from the incarnation you desire:
- Check
example.py
for instructions on how to extract model weights and how to use the provided implementation.
The Char-CNN-RNN
model is prevalent in the Text-to-Image task, and is used to
process image descriptions to obtain embeddings that contain visual-relevant
features. This PyTorch translation may be useful for researchers interested in
using Char-CNN-RNN
models without relying on precomputed embeddings, which is
especially handy for testing models with novel image descriptions or new
datasets.
To use custom datasets, you will have to create a PyTorch Dataset, which should
load an preprocess instances (check dataset.py
for
inspiration). The original preprocessing steps for images and text are
described in Section 5 of the original
paper. Your dataset should return a
dictionary containing the following information:
img
: Image data. In the original implementation, this is a 1024-dimensional feature vector. The dimensions of image data and processed text data (Char-CNN-RNN
output) must match.txt
: Textual data. Your dataset should return a one-hot representation (check the text utility functions inchar_cnn_rnn/char_cnn_rnn.py
). The characters allowed are lowercase alphabetical and punctuation.
- Add MS-COCO dataset (used in ICML paper)
- Add evaluation visualization