This is the official implementation of CODA, introduced in CODA: Repurposing Continuous VAEs for Discrete Tokenization.
We identify that training conventional VQ tokenizers is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. This often lead to unstable training, low codebook utilization, and limited reconstruction quality. Instead of training discrete tokenizers from scratch, we introduce CODA (COntinuous-to-Discrete Adaptation), which adapts off-the-shelf continuous VAEs --- already optimized for perceptual compression --- into discrete tokenizers via a carefully designed discretization process. This ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs.
Install corresponding environments with
git clone git@github.com:LeapLabTHU/CODA.git
cd tokenizer
pip install -r requirements.txt
Prepare the required pretrained models and dataset
- Prepare the ImageNet dataset and replace the
PATH_TO_IMAGENET
with the corresponding path on your machine. - Prepare the pretrained models: MAR VAE, FLUX VAE and Style-GAN DINO discriminator:
data
├── train
│ ├── folder 1 (class 1)
│ ├── folder 2 (class 1)
│ ├── ...
|
├── val
│ ├── folder 1 (class 1)
│ ├── folder 2 (class 1)
│ ├── ...
checkpoints
├── mar_vae
│ ├── kl16.safetensors
|
├── flux_vae
│ ├── config.json
│ ├── diffusion_pytorch_model.safetensors
|
├── dino_disc
│ ├── dino_deitsmall16_pretrain.safetensors
Training
bash run.sh
See run.sh
for detailed configs for running MAR and FLUX based models.
Model | Link |
---|---|
MAR, |
link |
FLUX, |
link |
- Generation training code & checkpoints
- Tokenizer checkpoints
- Tokenizer training codes
⛽⛽⛽ liuzeyu24@mails.tsinghua.edu.cn
Our implementation is based on vaex, VQGAN, SEED-Voken, MAR, pytorch-fid.
We thank the authors for their excellent work.