/TesserNet

Tesseract bindings for .NET

Primary LanguageC#Apache License 2.0Apache-2.0

NuGet
Buy Me A Coffee

TesserNet

TesserNet provides high level bindings for Tesseract in .NET. The library comes with all required native libraries and a trained English model, meaning you don't need any additional setup to get the library up and running! Additionally, the library provides a simple Tesseract instance pooling system (through the TesseractPool class) so you can carelessly make asynchronous OCR invocations.

Limitations

Windows is currently the only version that doesn't require installing extra dependencies. For Linux distributions it is necessary to install tesseract-ocr. For distributions that use apt as the package manager (e.g. Ubuntu, Debian, Raspbian) this can be done using sudo apt-get install tesseract-ocr. Linux support is new and experimental. Problems might arise due to tesseract-ocr not being available or because the found version is too old. iOS is currently not yet supported.

Downloads

TesserNet
TesserNet for System.Drawing
TesserNet for ImageSharp
TesserNet for SkiaSharp

License

This product includes Leptonica, which is available under a "BSD 2-clause" license.
This product includes Tesseract, which is available under a "Apache Version 2.0" license.

Usage

When using on Linux, make sure tesseract-ocr has been installed on your system.

There are a few example project available for you to try out in the src directory. Note that the TesserNet.Example.System.Drawing example uses .NET Framework, meaning it will only run on Windows.

To start off, one first needs to add the following import:

using TesserNet;

One can then create a Tesseract instace:

Tesseract tesseract = new Tesseract();

With that instance one can now perform OCR.

string result = tesseract.Read(...);

By default, the following Read methods are provided:

string Read(byte[] data, int width, int height, int bytesPerPixel);
string Read(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);

Additionally, if one prefers to use System.Drawing, ImageSharp or SkiaSharp, it is possible to also add a dependency to TesserNet.System.Drawing, TesserNet.ImageSharp or TesserNet.SkiaSharp respectively. Adding either of these dependencies adds the following Read methods:

string Read(Image image);
string Read(Image image, Rectangle rectangle);
Task<string> ReadAsync(Image image);
Task<string> ReadAsync(Image image, Rectangle rectangle);

Furthermore, when trying to use concurrency, it might be useful to have a look at the TesseractPool class:

TesseractPool pool = new TesseractPool();

The TesseractPool class provides a pooling mechanism for running the OCR on multiple Tesseract instances, without having to manually deal with all the different instances. The class has the following methods:

string Read(byte[] data, int width, int height, int bytesPerPixel);
string Read(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);

And when either of the aforementioned image processing bridging libraries are present:

string Read(Image image);
string Read(Image image, Rectangle rectangle);
Task<string> ReadAsync(Image image);
Task<string> ReadAsync(Image image, Rectangle rectangle);