Optical Character Recognition

Question

Optical Character Recognition

Closed this issue a month ago · 1 comments

flatsiedatsie commented 2 months ago

Feature request

I was thinking it would be awesome if you could scan text or documents with your phone's camera.

This is not at all important, since there seem to be solutions to doing in the browser already:

Tensorflow.js has a demo
Tesseract.js seems to be popular too.

Motivation

Allowing a device to 'read' the world could be a valuable building block. It could simplify creating interesting pipelines when combined with the existing Transformers.js features. E.g.

Detect signs and other objects in public space, OCR them, translate them, and speak them out loud, or show the translation on top of the image. This could be useful during travel, or for people who aren't great at reading.
Aid in scanning in documents, which can them be analyzed with the endless other text manipulation features. Create summaries of texts, scan for safe/good options in restaurant menus, save businesscards, read licenseplates, create games, etc.

Your contribution

I can help research models, aid in testing and design/build a demo.

Answer 1 · 2024-04-18T14:52:49.000Z

Seems it's already supported through:

https://huggingface.co/Xenova/donut-base-finetuned-docvqa

More details:
https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa