Diffusion-RWKV: A Python repository from feizc

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
_{Official PyTorch Implementation}

This repo contains PyTorch model definitions, pre-trained weights and training/sampling code for our paper scalable diffusion models with RWKV-like architectures, named Diffusion-RWKV. It builds a series of architectures adapted from the RWKV model used in the NLP, with requisite modifications tailored for diffusion model applied to image generation tasks.

1. Environments

Python 3.10
- conda create -n your_env_name python=3.10
Requirements file
- pip install -r requirements.txt
Install mmcv-full and mmcls
- pip install -U openmim
- mim install mmcv-full==1.7.0
- pip install mmcls==0.25.0

2. Training

We provide a training script for Diffusion-RWKV in train.py. This script can be used to train unconditional, class-conditional Diffusion-RWKV models, it can be easily modified to support other types of conditioning.

To launch DRWKV-H/2 (256x256) in the latent space training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--model DRWKV-H/2 \
--dataset-type imagenet \
--data-path /path/to/imagenet/train \
--image-size 256 \
--latent_space True \
--task-type class-cond \
--vae_path /path/to/vae \
--num-classes 1000

To launch DRWKV-B/2 (32x32) in the pixel space training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--model DRWKV-B/2 \
--dataset-type celeba \
--data-path /path/to/imagenet/train \
--image-size 32 \
--task-type uncond

There are several additional options; see train.py for details. Experiments for training script can be found in the file direction script.

For convenience, the pre-trained Diffusion-RWKV models can be directly downloaded in huggingface.

3. Evaluation

We include a sample.py script which samples images from a Diffusion-RWKV model. Besides, we support other metrics evaluation, e.g., FLOPS and model parameters, in test.py script.

python sample.py \
--model DRWKV-H/2 \
--ckpt /path/to/model \
--image-size 256 \
--num-classes 1000 \
--cfg-scale 1.5 \
--latent_space True

4. BibTeX

@article{FeiDRWKV2024,
  title={Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models},
  author={Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Jusnshi Huang},
  year={2024},
  journal={arXiv preprint},
}

5. Acknowledgments

The codebase is based on the awesome DiT, RWKV, DiS, and Vision-RWKV repos.