joanrod/ocr-vqgan

vqgan result

Winnie202 opened this issue · 2 comments

Thank you for sharing the code,I used taming-tranformer to did the image reconstruction for Street View,but smaller text sections don't work well. If i use this model to train this type of dataset can optimize the reconstruction results of vqgan with small text,like these:
233952925-d06ce36a-19c0-49b0-aff6-f68c7ef89e03

233952976-b6de91dd-06ea-43f8-882c-04c776d13ecc

233952863-3f321c3c-ff90-4ad8-a932-434ae106378e

Hi @Winnie202, awesome results!! I'm glad you could reconstruct the small texts. We could try generating synthetic scene-text as a follow-up

awesome results!! I'm glad you could reconstruct the small texts. We could try generating synthetic scene-text as a follow-up

Do you have any good suggestions for improving the reconstruction of the text in these scenes