This repository contains the authors' implementation of several deep learning-based methods related to lossy image compression.
- Lossy Image Compression with Quantized Hierarchical VAEs [arxiv] [cvf] [ieee]
- Published at WACV 2023, Best Algorithms Paper Award
- Abstract: a 12-layer VAE model named QRes-VAE. Good compression performance.
- [Code & pre-trained models]: lossy-vae/lvae/models/qres
- QARV: Quantization-Aware ResNet VAE for Lossy Image Compression [arxiv] [ieee]
- Published at TPAMI 2023
- Abstract: an improved version of the previous model; Variable rate, faster decoding, better performance.
- [Code & pre-trained models]: lossy-vae/lvae/models/qarv
- An Improved Upper Bound on the Rate-Distortion Function of Images [arxiv] [ieee]
- Published at ICIP 2023
- Abstract: a 15-layer VAE model used to estimate the information R(D) function. This model proves that -30% BD-rate w.r.t. VTM is theoretically achievable.
- [Code & pre-trained models]: lossy-vae/lvae/models/rd
Progressive coding: our models learn a deep hierarchy of latent variables and compress/decompress images in a coarse-to-fine fashion. This feature comes from the hierarchical nature of ResNet VAEs.
Compression performance: our models are powerful in terms of both rate-distortion and decoding speed. Please see the results section below.
- Kodak images: lossy-vae/results/kodak
- Tecknick TESTIMAGES RGB 1200x1200: lossy-vae/results/tecnick-rgb-1200
- CLIC 2022 test set: lossy-vae/results/clic2022-test
Notes on metric computation:
- Bpp and PSNR are first computed for each image and then averaged over all images in a dataset.
- Bpp is the saved file size (in bits) divided by # of image pixels.
- PSNR is computed in RGB space (not YUV).
Model Name | CPU* Enc. | CPU* Dec. | 3080 ti Enc. | 3080 ti Dec. | BD-rate* (lower is better) |
---|---|---|---|---|---|
qres34m |
0.899s | 0.441s | 0.116s | 0.083s | -3.95 % |
qarv_base |
0.757s | 0.295s | 0.096s | 0.063s | -7.26 % |
*Time is the latency to encode/decode a 512x768 image, averaged over 24 Kodak images. Tested in plain PyTorch (v1.13 + CUDA 11.7) code, ie, no mixed-precision, torchscript, ONNX/TensorRT, etc.
*CPU is Intel 10700k.
*BD-rate is w.r.t. VTM 18.0, averaged on three common test sets (Kodak, Tecnick TESTIMAGES, and CLIC 2022 test set).
Requirements:
- Python
- PyTorch >= 1.9 : https://pytorch.org/get-started/locally
- tqdm :
conda install tqdm
- CompressAI : https://github.com/InterDigitalInc/CompressAI
- timm >= 0.8.0 : https://github.com/huggingface/pytorch-image-models
Download and Install:
- Download the repository;
- Modify the dataset paths in
lossy-vae/lvae/paths.py
. - [Optional] pip install the repository in development mode:
cd /pasth/to/lossy-vae
python -m pip install -e .
from lvae import get_model
model = get_model('qarv_base', pretrained=True) # weights are downloaded automatically
model.eval()
model.compress_mode(True) # initialize entropy coding
Encode an image:
model.compress_file('/path/to/image.png', '/path/to/compressed.bits')
Decode an image:
im = model.decompress_file('/path/to/compressed.bits')
# im is a torch.Tensor of shape (1, 3, H, W). RGB. pixel values in [0, 1].
COCO
- Download the COCO dataset "2017 Train images [118K/18GB]" from https://cocodataset.org/#download
- Unzip the images anywhere, e.g., at
/path/to/datasets/coco/train2017
- Edit
lossy-vae/lvae/paths.py
such that
known_datasets['coco-train2017'] = '/path/to/datasets/coco/train2017'
Kodak (link), Tecnick TESTIMAGES (link), and CLIC (link)
python scripts/download-dataset.py --name kodak --datasets_root /path/to/datasets
clic2022-test
tecnick
Then, edit lossy-vae/lvae/paths.py
such that known_datasets['kodak'] = '/path/to/datasets/kodak'
, and similarly for other datasets.
Custom Dataset
- Prepare a folder containing images. The folder should contain only images (may contain subfolders).
- Edit
lossy-vae/lvae/paths.py
such thatknown_datasets['custom-name'] = '/path/to/my-dataset'
, wherecustom-name
is the name of your dataset, and/path/to/my-dataset
is the path to the folder containing images. - Then, you can use
custom-name
as the dataset name in the training/evaluation scripts.
Training and evaluation scripts vary from model to model. For example, qres34m
uses fixed-rate train/eval scheme, while qarv_base
uses variable-rate train/eval scheme.
Detailed training/evaluation instructions are provided in each model's subfolder (see the section Models).
Code in this repository is freely available for non-commercial use.