/vim

An Updated VitVQGAN Implementation for Learning Purpose

Primary LanguagePythonMIT LicenseMIT

VIT-VQGAN

This is an unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch. ViT-VQGAN is a simple ViT-based Vector Quantized AutoEncoder while RQ-VAE introduces a new residual quantization scheme. Further details can be viewed in the papers

Installation

pip install vitvqgan 

Training

Stage 1 - VQ Training:

python -m vitvqgan.train_vim

You can add more options too:

python -m vitvqgan.train_vim -c imagenet_vitvq_small -lr 0.00001 -e 10

It uses Imagenette as the training dataset for demo purpose, to change it, modify dataloader init file.

Inference:

  • download checkpoints from above in mbin folder
  • Run the following command:
python -m vitvqgan.demo_recon

Checkpoints

Acknowledgements

The repo is modified from here with updates to latest dependencies and to be easily run in consumer-grade GPU for learning purpose.