/manga-ocr-for-chrome

OCR for japanese text in your browser!

Primary LanguageTypeScript

Manga-OCR for Google Chrome

This extension enables you to easily look up the meaning of words and sentences in Japanese manga you can read online in your browser. This extension aims to complement extensions like Yomichan and Rikaikun, by allowing you to perform OCR in your browser (using kha-white (manga-ocr)'s excellent model).

It also allows you to fetch machine translations from online services (currently only openai's chatgpt-3.5).

This is a hobby project, and your feedback/contributions are welcome!

Exporting the manga-ocr model to ONNX

Hopefully the prerequisite files will be available in the huggingface repository soon, but in the meantime you can do it yourself with a little more effort.

See install/prerequisite/troubleshooting steps in the optimum documentation here: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model

Note that the correct task for this model is vision2seq-lm, as specified below.

optimum-cli export onnx -m 'kha-white/manga-ocr-base' ./OUTPUT_FOLDER --task=vision2seq-lm

This should provide a pair of files, encoder-model.onnx and decoder-model.onnx, which you should then put in the ./res folder prior to building.

Performance considerations in onnx-runtime-web

Currently this model is running as a web assembly module. By default (but possibly according to hardware settings) it will attempt to spawn several worker threads; the content-security-policy set by chrome is different in these worker threads and as a result they fail to load the .wasm assemblies, so we've set the total number of threads to 1 in the ocr.ts class. This will allow at least one thread to load and process OCR requests.

//Set ORT threads to 1, since the csp permissions are borked in workers currently: 
ort.env.wasm.numThreads = 1;

Thank you for your patience!

Check out these other cool projects:

  • manga-ocr - an OCR model for japanese text focused on manga. - huggingface
  • EasyOCR - Another popular OCR library with 80+ supported languages
  • yomitan - A community-maintained fork of the now-sunset Yomichan project (chrome extension)
  • rikaikun - A chrome extension that shows definitions of japanese words when you hover over them
  • Manga Image Translator - Another tool designed to translate japanese manga that supports inpainting and text rendering
  • Balloon Translator - Yet another computer-aided comic/manga translation tool powered by deep learning
  • Sugoi Translation Toolkit - A suite of tools you can use to translate japanese manga, games, and other media.
  • Mokuro - A Japanese learning tool that allows you to batch OCR downloaded manga and generate HTML files you can use to read. Works best with an extension like yomitan/rikaikun/etc.
  • VGT - An electron app that also offers OCR + LLM API integrations
  • openai - Javascript/Typescript API library for interacting with OpenAI's services
  • Prompt Engineering Guide - A guide on how to build better prompts to use with LLMs

TODO

  • Basic support for OCR using manga-ocr
  • Keeps a history of previous OCR requests (within one session)
  • Batch translation request to openai
  • Automatically fetch onnx model from huggingface
  • Explore managing secrets using secure oauth2 environment
  • Explore using textbox-detect models to more quickly find textboxes
  • Support additional web-compiled OCR models
  • Allow live edits of OCR results for incorrect captures
  • Additional tools to help build vocabulary based on captured text
  • Support web-compiled translation models for fully-offline translation (ex. sugoi)