VectorAscent: Generate vector graphics from a textual description

Example

"a painting of an evergreen tree"

python text_to_painting.py --prompt "a painting of an evergreen tree" --num_iter 2500 --use_blob --subdir vit_rn50_useblob

We rely on CLIP for its aligned text and image encoders, and diffvg, a differentiable vector graphics rasterizer. Differentiable rendering allows us to generate raster images from vector paths, but isn't provided textual descriptions. We use CLIP to score the similarity between raster graphics and textual captions. Using gradient ascent, we can then optimize for a vector graphic whose rasterization has high similarity with a user-provided caption, backpropagating through CLIP and diffvg to the vector graphics parameters. This project is partially inspired by Deep Daze, a caption-guided raster graphics generator.

Quick start

Requirements:

torch
torchvision
matplotlib
numpy
scikit-image
clip
diffvg

Install our dependencies and CLIP.

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm numpy matplotlib scikit-image
pip install git+https://github.com/openai/CLIP.git

Then follow these instructions to install diffvg.

Citation

@software{jain21vector,
  author = {Jain, Ajay},
  title = {VectorAscent: Generate vector graphics from a textual description},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/ajayjain/VectorAscent}
}