QRes-VAE (Quantized ResNet VAE) is a neural network model for lossy image compression. It is based on the ResNet VAE architecture.
Paper: Lossy Image Compression with Quantized Hierarchical VAEs
Arxiv: https://arxiv.org/abs/2208.13056
- Progressive coding: the QRes-VAE model learns a hierarchy of features. It compresses/decompresses images in a coarse-to-fine fashion.
Note: images below are from the CelebA dataset and COCO dataset, respectively.
- Lossy compression efficiency: the QRes-VAE model has a competetive rate-distortion performance, especially at higher bit rates.
Requirements:
- Python,
pytorch>=1.9
,tqdm
,compressai
(link),timm>=0.5.4
(link). - Code has been tested in all of the following environments:
- Both Windows and Linux, with Intel CPUs and Nvidia GPUs
- Python 3.9
pytorch=1.9, 1.10, 1.11
with CUDA 11.3pytorch=1.12
with CUDA 11.6, in which models run faster (both training and testing) than in previous versions.
Download:
- Download the repository;
- Download the pre-trained model checkpoints and put them in the
checkpoints
folder. Seecheckpoints/README.md
for expected folder structure.
- QRes-VAE (34M) [Google Drive]: our main model for natural image compression.
- QRes-VAE (17M) [Google Drive]: a smaller model trained on CelebA dataset for ablation study.
- QRes-VAE (34M, lossless) [Google Drive]: a lossless compression model. Better than PNG but not as good as WebP.
The lmb
in the name of folders is the multiplier for MSE during training. I.e., loss = rate + lmb * mse
.
A larger lmb
produces a higher bit rate but lower distortion.
- Compression and decompression (lossy): See
demo.ipynb
. - Compression and decompression (lossless):
experiments/demo-lossless.ipynb
- Progressive decoding:
experiments/progressive-decoding.ipynb
- Sampling:
experiments/uncond-sampling.ipynb
- Latent space interpolation:
experiments/latent-interpolation.ipynb
- Inpainting:
experiments/inpainting.ipynb
- Rate-distortion:
python evaluate.py --root /path/to/dataset
- BD-rate:
experiments/bd-rate.ipynb
- Estimate end-to-end flops:
experiments/estimate-flops.ipynb
Training is done by minimizing the stats['loss']
term returned by the model's forward()
function.
We used the COCO dataset for training, and the Kodak images for periodic evaluation.
- COCO: https://cocodataset.org
- Kodak: http://r0k.us/graphics/kodak
The file train.py
is a simple example script for single-GPU training.
To train the qres34m
model with lmb=1024
:
python train.py --model qres34m --lmb 1024 --train_root /path/to/coco/train2017 --train_crop 256 \
--val_root /path/to/kodak --batch_size 64 --workers 4
In case of a CUDA error: out of memory
, try reduce the batchsize (as well as the learning rate):
python train.py --model qres34m --lmb 1024 --train_root /path/to/coco/train2017 --train_crop 256 \
--val_root /path/to/kodak --batch_size 16 --lr 1e-4 --workers 4
TBD
The code has a non-commercial license, as found in the LICENSE file.
TBD