Transferring ocr-vqgan to an image super-resolution task for text within images?
songkq opened this issue · 3 comments
@joanrod Hi, as you treat ocr-vqgan as an image reconstruction task, I'm wondering whether it can be used for an image super-resolution task? Does the OCR-perceptual loss help to improve the quality of text within images?
Also I'm confusing if we treat ocr-vqgan as a text-to-figure generation task, why does it require a Gound truth figure as input and then generate a "degraded" text within image? How could we use it for real-life text rendering applications?
Hi @songkq
ocr-vqgan can be seen as a compression module for within-image text fidelity, which is a common problem of vanilla vqgan or vae. That is why the ground truth is the original image and we evaluate on a reconstruction task.
We propose ocr-perceptual loss which can be used for super-resolution, as you suggest! Feel free to explore on that and share results :)
Thanks for reaching out!
Joan