The UNet structure is almost same as the vanilla DDPM, except that self-attention is performed at the last depth and the depth right before the last depth, group normalization is performed on total 8 groups instead of 32 groups, and the linear scale of embedding generation module is replaced from 10,000 to 5,000. As mentioned in the paper, gamma value is sampled between two alpha values at t-1 and t with a unifrom probability distribution, and the square rooted value of gamma is directly inserted to the embedding generation module.
Result
64x64 to 256x256 Model
A. Settings
Tag
Setting
Base Channel
56
Train Batch Size
4
Train Iterations
500K
Trian Data
DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data
DIV2K Validation Set
Test Data
Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation
Random Crop, Random Flip, Random Rotation
Test Data Augmentation
Centor Crop
Train Learning Rate Schedule
Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule
Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule
Linear Schedule from 1e-4 to 0.1
Train Steps
1000
Sample Steps
100
B. Scores
Dataset
IS (Mean, Std.)
FID
PSNR
SSIM
centor crop 64x64 to 256x256
(12.829, 0.992)
3.642
23.185
0.564
centor crop 256x256 to 1024x1024
(21.305, 2.290)
0.312
23.819
0.617
Note that this model does not train on 256x256 to 1024x1024.
Inception Score shows low values as cropped images are hard to recognize as an object. As crop size increases, Inception Score also increases.
C. Samples
Note that the below LR images are upsampled images by using bicubic interpolation.
Validation (64x64 to 256x256)
Tag
Image
LR
Sample
HR
Test (64x64 to 256x256)
Tag
Image
LR
Sample
HR
Test (256x256 to 1024x1024)
Tag
Image
LR
Sample
HR
32x32 to 128x128 Model
Dataset
IS (Mean, Std.)
FID
PSNR
SSIM
centor crop 32x32 to 128x128
(7.159, 0.437)
8.177
23.609
0.563
A. Settings
Tag
Setting
Base Channel
64
Train Batch Size
12
Train Iterations
500K
Trian Data
DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data
DIV2K Validation Set
Test Data
Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation
Random Crop, Random Flip, Random Rotation
Test Data Augmentation
Centor Crop
Train Learning Rate Schedule
Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule
Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule
Linear Schedule from 1e-6 to 0.05
Train Steps
1000
Sample Steps
100
C. Samples
Note that the below LR images are upsampled images by using bicubic interpolation.