/CODA

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Primary LanguagePythonApache License 2.0Apache-2.0

CODA: Repurposing Continuous VAEs for Discrete Tokenization

This is the official implementation of CODA, introduced in CODA: Repurposing Continuous VAEs for Discrete Tokenization.

🔆 Highlights

We identify that training conventional VQ tokenizers is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. This often lead to unstable training, low codebook utilization, and limited reconstruction quality. Instead of training discrete tokenizers from scratch, we introduce CODA (COntinuous-to-Discrete Adaptation), which adapts off-the-shelf continuous VAEs --- already optimized for perceptual compression --- into discrete tokenizers via a carefully designed discretization process. This ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs.

🔧 Usage

Tokenizer

Install corresponding environments with

git clone git@github.com:LeapLabTHU/CODA.git
cd tokenizer
pip install -r requirements.txt

Prepare the required pretrained models and dataset

  1. Prepare the ImageNet dataset and replace the PATH_TO_IMAGENET with the corresponding path on your machine.
  2. Prepare the pretrained models: MAR VAE, FLUX VAE and Style-GAN DINO discriminator:
data
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
|
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...


checkpoints
├── mar_vae
│   ├── kl16.safetensors
|
├── flux_vae
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
|
├── dino_disc
│   ├── dino_deitsmall16_pretrain.safetensors

Training

bash run.sh

See run.sh for detailed configs for running MAR and FLUX based models.

📚 Model Zoo

Model Link
MAR, $V=16384$ link
FLUX, $V=65536$ link

🔎 Code Release

  • Generation training code & checkpoints
  • Tokenizer checkpoints
  • Tokenizer training codes

📬 Contact

⛽⛽⛽ liuzeyu24@mails.tsinghua.edu.cn

🔖 Acknowledgements

Our implementation is based on vaex, VQGAN, SEED-Voken, MAR, pytorch-fid.

We thank the authors for their excellent work.