"a painting of an evergreen tree"
python text_to_painting.py --prompt "a painting of an evergreen tree" --num_iter 2500 --use_blob --subdir vit_rn50_useblob
We rely on CLIP for its aligned text and image encoders, and diffvg, a differentiable vector graphics rasterizer. Differentiable rendering allows us to generate raster images from vector paths, but isn't provided textual descriptions. We use CLIP to score the similarity between raster graphics and textual captions. Using gradient ascent, we can then optimize for a vector graphic whose rasterization has high similarity with a user-provided caption, backpropagating through CLIP and diffvg to the vector graphics parameters. This project is partially inspired by Deep Daze, a caption-guided raster graphics generator.
Requirements:
- torch
- torchvision
- matplotlib
- numpy
- scikit-image
- clip
- diffvg
Install our dependencies and CLIP.
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm numpy matplotlib scikit-image
pip install git+https://github.com/openai/CLIP.git
Then follow these instructions to install diffvg.
@software{jain21vector,
author = {Jain, Ajay},
title = {VectorAscent: Generate vector graphics from a textual description},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/ajayjain/VectorAscent}
}