/glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model. Now with example files for local running.

Primary LanguagePythonMIT LicenseMIT

GLIDE

This is a fork of the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

For details on the pre-trained models in this repository, see the Model Card.

Usage

To install this package, clone this repository and then run:

pip install -e .

For detailed usage examples, see the notebooks directory.

  • The text2im notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts. The local version of this notebook is text2im.py
  • The inpaint notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt. The local version of this notebook is inpaint.py.
  • The clip_guided notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts. The local version of this notebook is clip_guided.py.

Local versions

Converted Notebooks

The local versions of the notebooks are as close as possible to the original notebooks, which remain unchanged here. Changes to local versions include:

  • No need for "display"
  • Individual images are also saved, as well as the image strip (only upscaled images are saved by default)

Generation script

Additionally, a more commandline-friendly generation script, generate.py, is available. It can be set to use either classifier-free guidance, or CLIP guidance.

To use the generation script, simply run it with a text prompt as an additional commandline parameter:

python generate.py "Painting of an apple"

Example output under the given prompt: grid of images of "painting of an apple"

Multiple prompts can be specified, separated via "||". Individual batch items will cycle through the prompts in order. This can be used to evaluate multiple variations of a prompt in the same batch.

Parameters for configuring the generation script can be viewed with the -h flag:

> python generate.py -h
usage: GLIDE Text2Image [-h] [-s S] [-gs GS] [-cf] [-tb TB] [-tu TU] [-ut UT] [-ss] [-ni] [-v] [-rc RC] [prompt]

positional arguments:
  prompt      Prompt for image generation. Batch items cycles through multiple prompts separated by ||

optional arguments:
  -h, --help  show this help message and exit
  -s S        Batch size: Higher values generate more images at once while using more RAM
  -gs GS      Guidance scale parameter during generation (Higher values may improve quality, but reduce diversity)
  -cf         Use classifier-free guidance instead of CLIP guidance. CF guidance may yield 'cleaner' images, while
              CLIP guidance may be better at interpreting more complex prompts.
  -tb TB      Timestep value for base model. For faster generation, lower values (e.g. '100') can be used
  -tu TU      Timestep value for upscaler. For faster generation, use 'fast27'
  -ut UT      Temperature value for the upscaler. '1.0' will result in sharper, but potentially noisier/grainier
              images
  -ss         Additionally save the small 64x64 images (before the upscaling step)
  -ni         Don't save individual images (after the upscaling step)
  -v          Verbose mode: print additional runtime information
  -rc RC      Amount of different random prompts to use when no prompt is given

Text2Image generation using GLIDE, with classifier-free or CLIP guidance.