
Question about the high-resolution pixel decoder

Closed this issue · 2 comments


Very insightful work!
A question is about the details of the new high-resolution pixel decoder, which supports to generate high resolution, muliple aspect ratios, and high aesthetics images.
Could you please release some details of the training process?
Thanks a lot!

Best regards

The high-resolution pixel decoder is trained with the same strategy as the original one. Given an input image, it takes the discrete the visual token ID tokenized by our visual tokenizer as condition, and aims to recover the original input.