Transferring ocr-vqgan to an image super-resolution task for text within images?

Question

Transferring ocr-vqgan to an image super-resolution task for text within images?

songkq opened this issue 2 years ago · 3 comments

@joanrod Hi, as you treat ocr-vqgan as an image reconstruction task, I'm wondering whether it can be used for an image super-resolution task? Does the OCR-perceptual loss help to improve the quality of text within images?
Also I'm confusing if we treat ocr-vqgan as a text-to-figure generation task, why does it require a Gound truth figure as input and then generate a "degraded" text within image? How could we use it for real-life text rendering applications?

Answer 1 · 2023-02-09T15:22:21.000Z

Hi @songkq

ocr-vqgan can be seen as a compression module for within-image text fidelity, which is a common problem of vanilla vqgan or vae. That is why the ground truth is the original image and we evaluate on a reconstruction task.

We propose ocr-perceptual loss which can be used for super-resolution, as you suggest! Feel free to explore on that and share results :)

Thanks for reaching out!
Joan

Answer 2 · 2023-02-11T05:09:16.000Z

@joanrod Thanks for your reply. So ocr-vqgan can be suitable for a perceptual compression task for within-image text, right?

Answer 3 · 2023-02-11T14:37:38.000Z

That is correct @songkq