/transformers_ocr

An OCR tool using maim with Transformers.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Transformers OCR

https://tatsumoto.neocities.org/blog/mining-from-manga.html

AUR Chat GitHub

An OCR tool for the GNU operating system that uses Transformers. Supports Xorg and Wayland.

ocr.mp4

This Manga OCR application is likely the most suckless and lightweight option available. The application is designed to work best with a tiling window manager. It requires a minimum of dependencies, and all of them you probably already have. However, it still has to rely on large Python libraries to work. To isolate the bloat, these libraries are installed in a dedicated folder. But if your computer is rather slow, use Tesseract instead.

Installation

Arch Linux and Arch-based distros

Install from the AUR.

Other distros

If you want to package this program for your distribution and know how to do it, please create a pull request. Otherwise, read the section below.

To install manually (not recommended)

The steps below are for people who can't access the AUR.

Step 1. Install the following dependencies if they are not installed.

Xorg
Wayland
GNOME
KDE

Step 2. Install the program using Makefile.

git clone 'https://github.com/Ajatt-Tools/transformers_ocr.git'
cd -- 'transformers_ocr'
sudo make install

Setup

Before you start, download manga-ocr data:

transformers_ocr download

The files will be saved to ~/.local/share/manga_ocr.

Usage

To show a help page, run transformers_ocr help.

To OCR text on a manga page, run:

transformers_ocr recognize

Bind the command to a keyboard shortcut using your WM's config. This enables you to call the OCR from anywhere, as shown in the demo video.

For example, if you use i3wm, add this line to the config file.

bindsym $mod+o  exec --no-startup-id transformers_ocr recognize

The first run will take longer than usual. There are additional files that will be downloaded and saved to ~/.cache/huggingface.

On the first run transformers_ocr launches a listener process that is running is the background and reads any new screenshots passed to it. To speed up the first run, add the command below to autostart (using ~/.profile, ~/.xinitrc, etc.).

transformers_ocr start

Holding text

Quite often one sentence, phrase or a chunk of meaning is split between two or more speech bubbles. This is a problem because if you take a screenshot of the whole area, including the area between the speech bubbles, you will likely end up with junk in the results. Processing each bubble separately is also not ideal since you want to analyze the entire sentence in GoldenDict, add it to Anki, etc.

A solution is to have transformers-ocr hold text for you. It will recognize one speech bubble, remember it, then wait for another, and only copy the text from all bubbles altogether when you're done.

To use this feature, add a new keyboard shortcut to the config file of your WM, for example Mod+Shift+o. Example for i3wm:

bindsym $mod+Shift+o  exec --no-startup-id transformers_ocr hold
screencast.mp4

Every time you call hold, a speech bubble will be recognized and saved for later. Finally, call recognize using the usual keyboard shortcut to copy the last speech bubble and all the saved ones together. The list of saved bubbles will be emptied when calling recognize.

Config file

Optionally, you can create a config file.

mkdir -p ~/.config/transformers_ocr
touch ~/.config/transformers_ocr/config

Each line must have this format: key=value. Lines that start with # are ignored.

Send text to an external application

Instead of copying text to the clipboard, you may want to pass it as an argument to an external application. In the example below clip_command is set to goldendict which allows you to send recognized text directly to GoldenDict and keep the system clipboard for other tasks.

echo 'clip_command=goldendict %TEXT%' >> ~/.config/transformers_ocr/config
transformers_ocr stop
transformers_ocr start

If %TEXT% is passed as a parameter, it will be replaced with the actual text in the speech bubble. If not, the text will be passed to stdin of the called program.

Force CPU

If you want to force CPU.

echo 'force_cpu=yes' >> ~/.config/transformers_ocr/config