Paper | Project Page | Video
Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C.K. Chan, Chen Change Loy
S-Lab, Nanyang Technological University
- HuggingFace demo
- Replicate demo
-
Code release -
Update link to paper and project page -
Pretrained models -
Colab demo
For more evaluation, please refer to our paper for details.
- StableSR is capable of achieving arbitrary upscaling in theory, below is a 8x example with a result beyond 4K (5120x3680). The example image is taken from here.
- We further directly test StableSR on AIGC and compared with several diffusion-based upscalers following the suggestions. A 4K demo is here, which is a 4x SR on the image from here. More comparisons can be found here.
- Pytorch == 1.12.1
- CUDA == 11.7
- pytorch-lightning==1.4.2
- xformers == 0.0.16 (Optional)
- Other required packages in
environment.yaml
# git clone this repository
git clone https://github.com/IceClear/StableSR.git
cd StableSR
# Create a conda environment and activate it
conda env create --file environment.yaml
conda activate stablesr
# Install xformers
conda install xformers -c xformers/label/dev
# Install taming & clip
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .
Download the pretrained Stable Diffusion models from [HuggingFace]
python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --name NAME --scale_lr False
- Train CFW: set the ckpt_path in config files (Line 6).
You need to first generate training data using the finetuned diffusion model in the first stage. The data folder should be like this:
CFW_trainingdata/
└── inputs
└── 00000001.png # LR images
└── ...
└── gts
└── 00000001.png # GT images
└── ...
└── latents
└── 00000001.npy # Latent codes (4D tensors) of HR images generated by the diffusion U-net, saved in .npy format.
└── ...
└── samples
└── 00000001.png # The HR images generated from latent codes, just to make sure the generated latents are correct.
└── ...
Then you can train CFW:
python main.py --train --base configs/autoencoder/autoencoder_kl_64x64x4_resi.yaml --gpus GPU_ID, --name NAME --scale_lr False
python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --resume RESUME_PATH --scale_lr False
Download the Diffusion and VQGAN pretrained models from [HuggingFace | Google Drive | OneDrive].
You may add --nocolor
to disable color correction, this may lead to brighter results with less color similarity to the LR input.
- Test on 128 --> 512: You need at least 10G GPU memory to run this script (batchsize 2 by default)
python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5
- Test on arbitrary size w/o chop for VQGAN (for results beyond 512): The memory cost depends on your image size, but usually above 10G.
python scripts/sr_val_ddpm_text_T_vqganfin_oldcanvas.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5
- Test on arbitrary size w/ chop for VQGAN: Current default setting needs at least 18G to run, you may reduce the VQGAN tile size by setting
--vqgantile_size
and--vqgantile_stride
. Note the min tile size is 512 and stride should be smaller than tile size. Smaller size may introduce more border artifacts.
python scripts/sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5
If our work is useful for your research, please consider citing:
@inproceedings{wang2023exploiting,
author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
booktitle = {arXiv preprint arXiv:2305.07015},
year = {2023}
}
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
This project is based on stablediffusion, latent-diffusion, SPADE, mixture-of-diffusers and BasicSR. Thanks for their awesome works.
If you have any question, please feel free to reach me out at iceclearwjy@gmail.com
.