2024/10/16
🤗🤗 Users with autodl accounts can now quickly experience OpenSora-ToCa by directly using our publicly available image!2024/10/12
🚀🚀 We release our work ToCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.36× on OpenSora!
Python>=3.9 CUDA>=11.8
git clone https://github.com/Shenyi-Z/ToCa.git
We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.
Links:
Original Models | urls |
---|---|
DiT | https://github.com/facebookresearch/DiT |
PixArt-α | https://github.com/PixArt-alpha/PixArt-alpha |
OpenSora | https://github.com/hpcaitech/Open-Sora |
Besides, we provide a replica for our environment here
cd DiT-ToCa
conda env create -f environment-dit.yml
cd PixArt-alpha-ToCa
conda env create -f environment-pixart.yml
cd Open-Sora
conda env create -f environment-opensora.yml
pip install -v . # for development mode, `pip install -v -e .`
sample images for visualization
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 250 --cache-type attention --fresh-threshold 4 --fresh-ratio 0.07 --ratio-scheduler ToCa --force-fresh global --soft-fresh-weight 0.25
sample images for evaluation (e.g 50k)
cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --num-fid-samples 50000
sample images for visualization
cd PixArt-alpha-ToCa
python scripts/inference.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/test.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
sample images for evaluation (e.g 30k for COCO, 1.6k for PartiPrompts)
cd PixArt-alpha-ToCa
torchrun --nproc_per_node=6 scripts/inference_ddp.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
sample video for visualizaiton
cd Open-Sora
python scripts/inference.py configs/opensora-v1-2/inference/sample.py --num-frames 2s --resolution 480p --aspect-ratio 9:16 --prompt "a beautiful waterfall"
sample video for VBench evaluation
cd Open-Sora
bash eval/vbench/launch.sh /root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors 51 opensora-ToCa 480p 9:16
( remember replacing "/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors" with your own path!)
- Thanks to DiT for their great work and codebase upon which we build DiT-ToCa.
- Thanks to PixArt-α for their great work and codebase upon which we build PixArt-α-ToCa.
- Thanks to OpenSora for their great work and codebase upon which we build OpenSora-ToCa.
@article{zou2024accelerating,
title={Accelerating Diffusion Transformers with Token-wise Feature Caching},
author={Zou, Chang and Liu, Xuyang and Liu, Ting and Huang, Siteng and Zhang, Linfeng},
journal={arXiv preprint arXiv:2410.05317},
year={2024}
}
If you have any questions, please email shenyizou@outlook.com
.