Single Image Super-Resolution with WDSR and EDSR

A Keras-based implementation of

Wide Activation for Efficient and Accurate Image Super-Resolution (WDSR), winner of the NTIRE 2018 super-resolution challenge.
Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR), winner of the NTIRE 2017 super-resolution challenge.

Setup

Create a new Conda environment with

conda env create -f environment-gpu.yml

if you have a GPU^*). A CPU-only environment can be created with

conda env create -f environment-cpu.yml

Activate the environment with

source activate wdsr

^*) It is assumed that appropriate CUDA and cuDNN versions for the current tensorflow-gpu version are already installed on your system.

Pre-trained models

Pre-trained models are available here. Each directory contains a model together with the training settings. All of them were trained with images 1-800 from the DIV2K training set using the specified downgrade operator. Random crops and transformations were made as described in the EDSR paper. Model performance is measured in dB PSNR on the DIV2K benchmark (images 801-900 of DIV2K validation set, RGB channels, without self-ensemble). See also section Training.

Baseline models

Model	Scale	Residual blocks	Downgrade	Parameters	PSNR	Training
wdsr-a-8-x2¹⁾	x2	8	bicubic	0.89M	34.54 dB	settings
wdsr-a-16-x2¹⁾	x2	16	bicubic	1.19M	34.68 dB	settings
edsr-16-x2²⁾	x2	16	bicubic	1.37M	34.64 dB	settings

¹⁾ WDSR baseline(s), see also WDSR project page.
²⁾ EDSR baseline, see also EDSR project page.

Experimental models

Model	Scale	Residual blocks	Downgrade	Parameters	PSNR	Training
wdsr-a-32-x2	x2	32	bicubic	3.55M¹⁾	34.80 dB	settings
wdsr-a-32-x4	x4	32	bicubic	3.56M¹⁾	29.17 dB	settings
wdsr-a-32-x2-q90	x2	32	bicubic + JPEG (90)²⁾	3.55M¹⁾	32.12 dB	settings
wdsr-a-32-x4-q90	x4	32	bicubic + JPEG (90)²⁾	3.56M¹⁾	27.63 dB	settings
wdsr-b-32-x2	x2	32	bicubic	0.59M	34.63 dB	settings

¹⁾ For experimental WDSR-A models, an expansion ratio of 6 was used, increasing the number of parameters compared to an expansion ratio of 4. Please note that the default expansion ratio is 4 when using one the of the wdsr-a-* profiles with the --profile command line option for training. The default expansion ratio for WDSR-B models is 6.
²⁾ JPEG compression with quality 90 in addition to bicubic downscale. See also section JPEG compression.

Demo

First, download the wdsr-a-32-x4 model. Assuming that the path to the downloaded model is ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5, the following command super-resolves images in directory ./demo with factor x4 and writes the results to directory ./output:

python demo.py -i ./demo -o ./output --model ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5

Below are figures that compare the super-resolution results (SR) with the corresponding low-resolution (LR) and high-resolution (HR) images and an x4 resize with bicubic interpolation. The demo images were cropped from images in the DIV2K validation set.

DIV2K dataset

Download

If you want to train and evaluate models, you must download the DIV2K dataset and extract the downloaded archives to a directory of your choice (DIV2K in the following example). The resulting directory structure should look like:

DIV2K
  DIV2K_train_HR
    DIV2K_train_LR_bicubic
      X2
      X3
      X4
    DIV2K_train_LR_unknown
      X2
      X3
      X4
    DIV2K_valid_HR
    DIV2K_valid_LR_bicubic
      ...
    DIV2K_valid_LR_unknown
      ...

You only need to download DIV2K archives for those downgrade operators (unknown, bicubic) and super-resolution scales (x2, x3, x4) that you'll actually use for training.

Convert

Before the DIV2K images can be used they must be converted to numpy arrays and stored in a separate location. Conversion to numpy arrays dramatically reduces image loading times. Conversion can be done with the convert.py script:

python convert.py -i ./DIV2K -o ./DIV2K_BIN numpy

In this example, converted images are written to the DIV2K_BIN directory. You'll later refer to this directory with the --dataset command line option.

JPEG compression

There is experimental support for adding JPEG compression artifacts to LR images and training with these images. The following commands convert bicubic downscaled DIV2K training and validation images to JPEG images with quality 90:

python convert.py -i ./DIV2K/DIV2K_train_LR_bicubic \
                  -o ./DIV2K/DIV2K_train_LR_bicubic_jpeg_90 \
                   --jpeg-quality 90 jpeg

python convert.py -i ./DIV2K/DIV2K_valid_LR_bicubic \
                  -o ./DIV2K/DIV2K_valid_LR_bicubic_jpeg_90 \
                   --jpeg-quality 90 jpeg

After having converted these JPEG images to numpy arrays, as described in the previous section, models can be trained with the --downgrade bicubic_jpeg_90 option to additionally learn to recover from JPEG compression artifacts.

Training

WDSR and EDSR models can be trained by running train.py with the command line options and profiles described in train.py. For example, a WDSR-A baseline model with 8 residual blocks can be trained for scale x2 with

python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 2

The --dataset option sets the location of the DIV2K dataset and the --output option the output directory (defaults to ./output). Each training run creates a timestamped sub-directory in the specified output directory which contains saved models, all command line options (default and user-defined) in an args.txt file as well as TensorBoard logs. The scale factor is set with the --scale option. The downgrade operator can be set with the --downgrade option. It defaults to bicubic and can be changed to unknown.

By default, the model is validated against randomly cropped images from the DIV2K validation set. If you'd rather want to evaluate the model against the full-sized DIV2K validation images (= benchmark) after each epoch you need to set the --benchmark command line option. This however slows down training significantly and makes only sense for smaller models. Alternatively, you can benchmark saved models later with bench.py as described in the section Evaluation.

To train models for higher scales (x3 or x4) it is recommended to re-use the weights of a model pre-trained for a smaller scale (x2). This can be done with the --pretrained-model option. For example,

python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 4 \ 
    --pretrained-model ./output/20181016-063620/models/epoch-294-psnr-34.5394.h5

trains a WDSR-A baseline model with 8 residual blocks for scale x4 re-using the weights of model epoch-294-psnr-34.5394.h5, a WDSR-A baseline model with the same number of residual blocks trained for scale x2.

For a more detailed overview of available command line options and profiles please take a look at train.py.

Evaluation

An alternative to the --benchmark training option is to evaluate saved models with bench.py and then select the model with the highest PSNR. For example,

python bench.py -i ./output/20181016-063620/models -o bench.json

evaluates all models in directory ./output/20181016-063620/models and writes the results to bench.json. This JSON file maps model filenames to evaluation PSNR. The bench.py script also writes the best model in terms of PSNR to stdout at the end of evaluation:

Best PSNR = 34.5394 for model ./output/20181016-063620/models/epoch-294-psnr-37.4630.h5

The higher PSNR value in the model filename must not be confused with the value generated by bench.py. The PSNR value in the filename was generated during training by validating against smaller, randomly cropped images which tends to yield higher PSNR values.

Tests

The test suite can be run with

pytest tests

Weight normalization

WDSR models are trained with weight normalization. This branch uses a modified Adam optimizer. Branch wip-conv2d-weight-norm instead uses a specialized Conv2DWeightNorm layer and a default Adam optimizer (experimental work inspired by the official WDSR Tensorflow port). Current plan is to replace this layer with a default Conv2D layer and a Tensorflow WeightNorm wrapper when the wrapper is officially available in a Tensorflow release.

leviome/wdsr