This is an unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch. ViT-VQGAN is a simple ViT-based Vector Quantized AutoEncoder while RQ-VAE introduces a new residual quantization scheme. Further details can be viewed in the papers
pip install vitvqgan
Stage 1 - VQ Training:
python -m vitvqgan.train_vim
You can add more options too:
python -m vitvqgan.train_vim -c imagenet_vitvq_small -lr 0.00001 -e 10
It uses Imagenette
as the training dataset for demo purpose, to change it, modify dataloader init file.
Inference:
- download checkpoints from above in mbin folder
- Run the following command:
python -m vitvqgan.demo_recon
The repo is modified from here with updates to latest dependencies and to be easily run in consumer-grade GPU for learning purpose.