VIT-VQGAN

This is an unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch. ViT-VQGAN is a simple ViT-based Vector Quantized AutoEncoder while RQ-VAE introduces a new residual quantization scheme. Further details can be viewed in the papers

Installation

pip install vitvqgan

Training

Stage 1 - VQ Training:

python -m vitvqgan.train_vim

You can add more options too:

python -m vitvqgan.train_vim -c imagenet_vitvq_small -lr 0.00001 -e 10

It uses Imagenette as the training dataset for demo purpose, to change it, modify dataloader init file.

Inference:

download checkpoints from above in mbin folder
Run the following command:

python -m vitvqgan.demo_recon

Checkpoints

ViT-VQGAN Small
ViT-VQGAN Base

Acknowledgements

The repo is modified from here with updates to latest dependencies and to be easily run in consumer-grade GPU for learning purpose.

henrywoo/vim

VIT-VQGAN

Installation

Training

Checkpoints

Acknowledgements