joanrod/ocr-vqgan

Transferring ocr-vqgan to an image super-resolution task for text within images?

songkq opened this issue · 3 comments

@joanrod Hi, as you treat ocr-vqgan as an image reconstruction task, I'm wondering whether it can be used for an image super-resolution task? Does the OCR-perceptual loss help to improve the quality of text within images?
Also I'm confusing if we treat ocr-vqgan as a text-to-figure generation task, why does it require a Gound truth figure as input and then generate a "degraded" text within image? How could we use it for real-life text rendering applications?

Hi @songkq

ocr-vqgan can be seen as a compression module for within-image text fidelity, which is a common problem of vanilla vqgan or vae. That is why the ground truth is the original image and we evaluate on a reconstruction task.

We propose ocr-perceptual loss which can be used for super-resolution, as you suggest! Feel free to explore on that and share results :)

Thanks for reaching out!
Joan

@joanrod Thanks for your reply. So ocr-vqgan can be suitable for a perceptual compression task for within-image text, right?

That is correct @songkq