/SR3

Super Resolution with Diffusion Probabilistic Model

Primary LanguagePythonMIT LicenseMIT

SR3

Reimplementation of 4x SR3 https://arxiv.org/abs/2104.07636

The UNet structure is almost same as the vanilla DDPM, except that self-attention is performed at the last depth and the depth right before the last depth, group normalization is performed on total 8 groups instead of 32 groups, and the linear scale of embedding generation module is replaced from 10,000 to 5,000. As mentioned in the paper, gamma value is sampled between two alpha values at t-1 and t with a unifrom probability distribution, and the square rooted value of gamma is directly inserted to the embedding generation module.

Result

64x64 to 256x256 Model

A. Settings

Tag Setting
Base Channel 56
Train Batch Size 4
Train Iterations 500K
Trian Data DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data DIV2K Validation Set
Test Data Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation Random Crop, Random Flip, Random Rotation
Test Data Augmentation Centor Crop
Train Learning Rate Schedule Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule Linear Schedule from 1e-4 to 0.1
Train Steps 1000
Sample Steps 100

B. Scores

Dataset IS (Mean, Std.) FID PSNR SSIM
centor crop 64x64 to 256x256 (12.829, 0.992) 3.642 23.185 0.564
centor crop 256x256 to 1024x1024 (21.305, 2.290) 0.312 23.819 0.617

Note that this model does not train on 256x256 to 1024x1024.

Inception Score shows low values as cropped images are hard to recognize as an object. As crop size increases, Inception Score also increases.

C. Samples

Note that the below LR images are upsampled images by using bicubic interpolation.

Validation (64x64 to 256x256)
Tag Image
LR LR64_val
Sample Sample64_val
HR HR64_val
Test (64x64 to 256x256)
Tag Image
LR LR64
Sample Sample64
HR HR64
Test (256x256 to 1024x1024)
Tag Image
LR LR256
Sample Sample256
HR HR256

32x32 to 128x128 Model

Dataset IS (Mean, Std.) FID PSNR SSIM
centor crop 32x32 to 128x128 (7.159, 0.437) 8.177 23.609 0.563

A. Settings

Tag Setting
Base Channel 64
Train Batch Size 12
Train Iterations 500K
Trian Data DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data DIV2K Validation Set
Test Data Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation Random Crop, Random Flip, Random Rotation
Test Data Augmentation Centor Crop
Train Learning Rate Schedule Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule Linear Schedule from 1e-6 to 0.05
Train Steps 1000
Sample Steps 100

C. Samples

Note that the below LR images are upsampled images by using bicubic interpolation.

Validation (32x32 to 128x128)
Tag Image
LR LR-Val
Sample Sample-Val
HR HR-Val
Test (32x32 to 128x128)
Tag Image
LR LR
Sample Sample
HR HR