CrossmodalGroup/DynamicVectorQuantization

inference erro

Opened this issue · 2 comments

Thanks for your code contribution. Do I have to resize the picture to 256 when I use your pretrained checkpoint for the first stage of inference? My image size is 512*512 or other size, what should I do if I want to reconstruction it to the original size?

2023-06-12 18-33-19屏幕截图

Hi, the pre-trained model is trained on 256 resolution. Therefore you need to resize the picture to 256. You can train the your own DQVAE on 512 resolution. We adopt the VQGAN's encoder and decoder structure to construct DQVAE. Therefore it should be easy to scale it to 512 resolution. Also, if you want to train the entropy-based DQVAE, you can firstly calculate the image entropy threshold by the script in scripts/tools/calculate_entropy_thresholds.py.

Hi, the pre-trained model is trained on 256 resolution. Therefore you need to resize the picture to 256. You can train the your own DQVAE on 512 resolution. We adopt the VQGAN's encoder and decoder structure to construct DQVAE. Therefore it should be easy to scale it to 512 resolution. Also, if you want to train the entropy-based DQVAE, you can firstly calculate the image entropy threshold by the script in scripts/tools/calculate_entropy_thresholds.py.

Thank you for your answer. But I'm wondering, are you saying that inference an image at a certain resolution requires training the image at same resolution? I know that VQGAN can deduce images of arbitrary resolution under the condition of fixed sampling factor training