jy0205/LaVIT

Training detail of codebook

Closed this issue · 1 comments

Thanks for your work, I'm interested in your work, I tried to reproduce Dynamic Visual Tokenizer , but the reconstruction loss is around 0.3, can you give me some suggestions for training? Thanks

Hi, thanks for your attention! Here are some tricks we used in our training:

  1. reduce the codebook dim (32 is enough); use the k-means initialization for the codebook.
  2. use the EMA update instead of direct gradient training for the codebook.
  3. Train the selector and merger first (It means you need to train without quantization).
  4. After training the selector and merger! Open the vector quantization and update all modules end-to-end. (learn codebook)