xenova/transformers.js

Optical Character Recognition

Closed this issue · 1 comments

Feature request

I was thinking it would be awesome if you could scan text or documents with your phone's camera.

This is not at all important, since there seem to be solutions to doing in the browser already:

Motivation

Allowing a device to 'read' the world could be a valuable building block. It could simplify creating interesting pipelines when combined with the existing Transformers.js features. E.g.

  • Detect signs and other objects in public space, OCR them, translate them, and speak them out loud, or show the translation on top of the image. This could be useful during travel, or for people who aren't great at reading.
  • Aid in scanning in documents, which can them be analyzed with the endless other text manipulation features. Create summaries of texts, scan for safe/good options in restaurant menus, save businesscards, read licenseplates, create games, etc.

Your contribution

I can help research models, aid in testing and design/build a demo.