DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

This repository contains our official implementation of the NeurIPS 2023 paper: DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models, which can generate high-quality vector sketches based on text prompts. Our project page can be found here.

🆕 Update

[10/2023] We released the DiffSketcher code.
[10/2023] We released the VectorFusion code.
[10/2023] Thanks to @camenduru, DiffSketcher-colab has been released.

TODO

Add a webUI demo.
Add support for colorful results and oil painting.

🔧 Installation

Create a new conda environment:

conda create --name diffsketcher python=3.10
conda activate diffsketcher

Install pytorch and the following libraries:

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install omegaconf BeautifulSoup4
pip install opencv-python scikit-image matplotlib visdom wandb
pip install triton numba
pip install numpy scipy timm scikit-fmm einops
pip install accelerate transformers safetensors datasets

Install CLIP:

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Install diffusers:

pip install diffusers==0.20.2

Install xformers (require python=3.10):

conda install xformers -c xformers

Install diffvg:

git clone https://github.com/BachiLi/diffvg.git
cd diffvg
git submodule update --init --recursive
conda install -y -c anaconda cmake
conda install -y -c conda-forge ffmpeg
pip install svgwrite svgpathtools cssutils torch-tools
python setup.py install

🔥 Quickstart

Example:

Preview:

Script:

python run_painterly_render.py \ 
  -c diffsketcher.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=4 num_paths=96 sds.warmup=1000 num_iter=1500" \ 
  -pt "a photo of Sydney opera house" \ 
  -respath ./workdir/sydney_opera_house \ 
  -d 8019 \
  --download

-c a.k.a --config: configuration file, saving in DiffSketcher/config/.
-eval_step: the step size used to eval the method (too frequent calls will result in longer times).
-save_step: the step size used to save the result (too frequent calls will result in longer times).
-update: a tool for editing the hyper-params of the configuration file, so you don't need to create a new yaml.
-pt a.k.a --prompt: text prompt.
-respath a.k.a --results_path: the folder to save results.
-d a.k.a --seed: random seed.
--download: download models from huggingface automatically when you first run them.

crucial:

-update "token_ind=2" indicates the index of cross-attn maps to init strokes.
-update "num_paths=96" indicates the number of strokes.

optional:

-npt, a.k.a --negative_prompt: negative text prompt.
-mv, a.k.a --make_video: make a video of the rendering process (it will take much longer).
-frame_freq, a.k.a --video_frame_freq: control video frame.
Note: Download U2Net model and place in checkpoint/ dir if xdog_intersec=True
add enable_xformers=True in -update to enable xformers for speeding up.
add gradient_checkpoint=True in -update to use gradient checkpoint for low VRAM.

Another example

Preview:

Script:

python run_painterly_render.py \ 
  -c diffsketcher-width.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=4 num_paths=48 num_iter=500" \ 
  -pt "a photo of Sydney opera house" \ 
  -respath ./workdir/sydney_opera_house \ 
  -d 8019 \
  --download

More Examples

check the run.md for more scripts.

📚 Acknowledgement

The project is built based on the following repository:

We gratefully thank the authors for their wonderful works.

📎 Citation

If you use this code for your research, please cite the following work:

@inproceedings{xing2023diffsketcher,
  title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
  author={Xing, Ximing and Wang, Chuang and Zhou, Haitao and Zhang, Jing and Yu, Qian and Xu, Dong},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2023}
}

©️ Licence

This work is licensed under a MIT License.

SuperSecureHuman/DiffSketcher