An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

Original Paper

Cognitive Super-Resolution (CoSeR) is a stable diffusion-based super-resolution (SR) approach that enables SR models to “understand” low-resolution (LR) images.

🔨 Installation

pip install -r requirements.txt

💼 Models

We provide models trained on ImageNet1000 following the settings in orginal paper: Qformer, CoSeR.

🌟 Quick inference

Please download Stable Diffusion 2.1. and revise "PATH_FOR_QFORMER" and "PATH_FOR_SD" in configs/CoSeR/inference.yaml with the corresponding path. We also recommend to use the controllable feature wrapping from StableSR for the complete performance. 🤗

No image tiling, with reference image generation：

python scripts/inference.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW

With image tiling, for high-resolution image reasoning:

python scripts/inference_tile.py \
--inputdir ... \
--outdir ... \
--config configs/CoSeR/inference.yaml \
--load_ckpt PATH_FOR_COSER \
--vqgan_ckpt PATH_FOR_CFW

🎱 Training

Prepare training data:

Following the orginal paper, we process ImageNet1000 into 512*512 size. We selected a subset of 2000 images as the test set.

python data/prepare_imagenet.py

After that, we used the Real-ESRGAN method to generate LRs for the images in the test set.

We use BLIP2 to generate captions for every HR in the last step.

python data/generate_caption.py

Generating ImageNet intra-class similarity (CLIP similarity) which will be used in the training of reference image attention.

python data/count_clip_sim.py

(Stage 1) Training of the cognitive encoder: (please revise "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)

python main.py --train --base configs/CoSeR/qformer_srresnet_imagenet_all.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name

(Stage 2) Training of CoSeR: (please revise "PATH_FOR_QFORMER", "PATH_FOR_SD", "PATH_FOR_GT" and "PATH_FOR_LR" in the yaml)

python main.py --train --base configs/CoSeR/aia_512_imagenet_all_caption_clip_atten_ref.yaml --gpus 0,1,2,3,4,5,6,7 --name your_name

💙 Acknowledgments

This project is based on StableSR. Thanks for their awesome works.