/pytorch-vcii

Video Compression through Image Interpolation (ECCV'18) [PyTorch]

Primary LanguagePythonGNU Lesser General Public License v2.1LGPL-2.1

Video Compression through Image Interpolation

Chao-Yuan Wu, Nayan Singhal, Philipp Krähenbühl.
In ECCV, 2018. [Project Page] [Paper]

Overview

PyTorch implementation of deep video compression codec.

Currently supported:

  • Training interpolation models with different offsets.
  • Evaluation on single model (PSNR/MS-SSIM).
  • Some ablation study options.

Coming soon:

  • Entropy coding.
  • Evaluation on combined model.

Dependency

We conducted experiments in the following environment:

  • Linux
  • Python 3.6.3
  • PyTorch 0.3.0.post4
  • TITAN X GPUs with CuDNN.

Similar environments (e.g. with OSX, Python 2) might work with small modification, but not tested.

Getting started

We provide a demo training script which trains on 7 clips for 100 iterations, and evaluates on a hold-out clip. To run the demo, please download the demo data, and run train.sh 2 (the argument (0, 1, or 2) specifies the level of hierarchy). This will take about 3 minutes.

Expected output:

Creating loader for data/train...
448 images loaded.
	distance=1/2
Loader for 448 images (28 batches) created.
	Encoder fuse level: 1
	Decoder fuse level: 1
...
[TRAIN] Iter[1]; LR: 0.00025; Loss: 0.260715; Backprop: 0.2985 sec; Batch: 2.8358 sec
[TRAIN] Iter[2]; LR: 0.00025; Loss: 0.237539; Backprop: 0.2371 sec; Batch: 1.5466 sec
[TRAIN] Iter[3]; LR: 0.00025; Loss: 0.241159; Backprop: 0.3445 sec; Batch: 1.4208 sec
[TRAIN] Iter[4]; LR: 0.00025; Loss: 0.193481; Backprop: 0.2328 sec; Batch: 1.3091 sec
[TRAIN] Iter[5]; LR: 0.00025; Loss: 0.181479; Backprop: 0.2336 sec; Batch: 1.2742 sec
...
[TRAIN] Iter[99]; LR: 0.00025; Loss: 0.090678; Backprop: 0.2444 sec; Batch: 1.3960 sec
[TRAIN] Iter[100]; LR: 0.00025; Loss: 0.082984; Backprop: 0.2431 sec; Batch: 1.3988 sec
Loss at each step:
0.1758 0.0982 0.0620 0.0574 0.0579 0.0597 0.0653 0.0742 0.0846 0.0949
...
Start evaluation...
...
Creating loader for data/eval...
8 images loaded.
	distance=1/2
Loader for 8 images (8 batches) created.
...
Evaluation @iter 100 done in 24 secs
TVL Loss   : 0.12207	0.06993	0.06193	0.06525	0.06742	0.07027	0.07543	0.08148	0.08650	0.09003
TVL MS-SSIM: 0.61841	0.81475	0.85905	0.87109	0.87745	0.88022	0.87903	0.87486	0.86880	0.86132
TVL PSNR   : 28.02937	28.63096	28.87302	28.87184	28.77673	28.64452	28.44989	28.28644	28.19869	28.15354

Output images with different number of progression compression iterations are stored in a directory called output. It starts with the blurry output with 1 iteration:

silent_cif_0002.png_iter01.png

to the better output with 10 iterations:

silent_cif_0002.png_iter10.png

Since we've only trained the model for 3 minutes, the results don't look great yet, but we can see that it roughly starts to reconstruct the frames.

The final result using full training set will look like:

vtl_silent_0.289.png

Please see our Project Page for more examples.

To train or evaluate on additional datasets, please see DATA.md for details and instructions.

Model weights and logs

Model weights are available here. The associated logs are available here.

Kinetics video IDs

The list of Kinetics videos we used for train/val/test is available here.

Citation

If you find this model useful for your research, please use the following BibTeX entry.

@inproceedings{wu2018vcii,
  title={Video Compression through Image Interpolation},
  author={Wu, Chao-Yuan and Nayan Singhal and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={ECCV},
  year={2018}
}

Acknowledgment

This implementation largely borrows from pytorch-image-comp-rnn by Biao Zhang (1zb). U-net implementation borrows from Pytorch-UNet by Milesi Alexandre. Thank you, Biao and Milesi Alexandre!