Optical Character Recognition
Closed this issue · 1 comments
flatsiedatsie commented
Feature request
I was thinking it would be awesome if you could scan text or documents with your phone's camera.
This is not at all important, since there seem to be solutions to doing in the browser already:
- Tensorflow.js has a demo
- Tesseract.js seems to be popular too.
Motivation
Allowing a device to 'read' the world could be a valuable building block. It could simplify creating interesting pipelines when combined with the existing Transformers.js features. E.g.
- Detect signs and other objects in public space, OCR them, translate them, and speak them out loud, or show the translation on top of the image. This could be useful during travel, or for people who aren't great at reading.
- Aid in scanning in documents, which can them be analyzed with the endless other text manipulation features. Create summaries of texts, scan for safe/good options in restaurant menus, save businesscards, read licenseplates, create games, etc.
Your contribution
I can help research models, aid in testing and design/build a demo.
flatsiedatsie commented
Seems it's already supported through:
https://huggingface.co/Xenova/donut-base-finetuned-docvqa
More details:
https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa