bytedance/1d-tokenizer

About the proxy code during training

Closed this issue · 2 comments

Hi,

Congrats on this remarkable achievement -- I am quite fascinated by the idea of query-based image compression. Since now the code has been partially released, I have a question regarding the "proxy code" you mentioned in the main paper. I quote the text below:

Specifically, in the first “warm-up” stage, instead of directly regressing the RGB values and employing
a variety of loss functions (as in existing methods), we propose to train 1D VQ models with the discrete
codes generated by an off-the-shelf MaskGIT-VQGAN model, which we refer to as proxy codes.

Based on my understanding, the code being produced by TiTok should be significantly less than MaskGIT-VQGAN, no? In a 256^2 setting, MaskGIT has claimed to use a fixed /16 factor resulting in 256 tokens. However, TiTok allows K=32/64/128 on this level, how did you warm up the quantized encoder with a teacher with more output tokens than your own?

Thank you, and I think releasing the training code on this part could also help a lot! Again, congrats on this breakthrough!

Best,
XM

Hi, does this answer your question?

#1 (comment)

Hi,

Thank you all for your interests in our work and patience. We have released the two-stage training code and feel free to check it.