/DALLE-reproduction

Reproducing OpenAI's DALLE model

Primary LanguageJupyter Notebook

DALLE-reproduction

This repository is for sharing pre-trained OpenAI DALLE model and generating images from given texts.

All models are trained by lucidrains/DALLE-pytorch + VQGAN (Taming transformer) with different training code and BPE model.

If you want to train DALLE, please go to lucidrains/DALLE-pytorch and support them to reproduce better DALLE models ✈️

The notebook includes

1. Text to image generation

2. Pre-trained CLIP reranking

  • CUB200

  • COCO

3. Generate rest of image based on the given cropped image

  • CUB200

  • COCO

Usage

  1. Install requirements
$ pip install -r requirements
  1. Install DeepSpeed
  • DeepSpeed is only necessary for attention type 'sparse'.
  • Follow the instruction here and install DeepSpeed

Models

  • Download models below and save them in pretrained folder
  • Check the link in Details for the model specifics
Dataset Download Password Optimizer Attention type Size Details
CUB200 link v9ge Adam ('full', 'sparse') 1.1GB link
CUB200 link eui3 Adam ('full', 'axial_row', 'axial_col', 'conv_like') 1.1GB link
CUB200 link 47w1 AdamW ('full', 'sparse') 1.1GB link
COCO link p3ki Adam ('full', 'sparse') 1.5GB link