Training detail of codebook
Closed this issue · 1 comments
Faded1022 commented
Thanks for your work, I'm interested in your work, I tried to reproduce Dynamic Visual Tokenizer , but the reconstruction loss is around 0.3, can you give me some suggestions for training? Thanks
jy0205 commented
Hi, thanks for your attention! Here are some tricks we used in our training:
- reduce the codebook dim (32 is enough); use the k-means initialization for the codebook.
- use the EMA update instead of direct gradient training for the codebook.
- Train the selector and merger first (It means you need to train without quantization).
- After training the selector and merger! Open the vector quantization and update all modules end-to-end. (learn codebook)