TesserNet provides high level bindings for Tesseract in .NET.
The library comes with all required native libraries and a trained English model, meaning you don't need any additional setup to get the library up and running!
Additionally, the library provides a simple Tesseract instance pooling system (through the TesseractPool
class) so you can carelessly make asynchronous OCR invocations.
Windows is currently the only version that doesn't require installing extra dependencies.
For Linux distributions it is necessary to install tesseract-ocr
.
For distributions that use apt
as the package manager (e.g. Ubuntu, Debian, Raspbian) this can be done using sudo apt-get install tesseract-ocr
.
Linux support is new and experimental. Problems might arise due to tesseract-ocr
not being available or because the found version is too old.
iOS is currently not yet supported.
TesserNet
TesserNet for System.Drawing
TesserNet for ImageSharp
TesserNet for SkiaSharp
This product includes Leptonica, which is available under a "BSD 2-clause" license.
This product includes Tesseract, which is available under a "Apache Version 2.0" license.
When using on Linux, make sure tesseract-ocr
has been installed on your system.
There are a few example project available for you to try out in the src
directory.
Note that the TesserNet.Example.System.Drawing
example uses .NET Framework,
meaning it will only run on Windows.
To start off, one first needs to add the following import:
using TesserNet;
One can then create a Tesseract
instace:
Tesseract tesseract = new Tesseract();
With that instance one can now perform OCR.
string result = tesseract.Read(...);
By default, the following Read
methods are provided:
string Read(byte[] data, int width, int height, int bytesPerPixel);
string Read(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
Additionally, if one prefers to use System.Drawing, ImageSharp or SkiaSharp, it is possible to also add a dependency to
TesserNet.System.Drawing,
TesserNet.ImageSharp or
TesserNet.SkiaSharp respectively.
Adding either of these dependencies adds the following Read
methods:
string Read(Image image);
string Read(Image image, Rectangle rectangle);
Task<string> ReadAsync(Image image);
Task<string> ReadAsync(Image image, Rectangle rectangle);
Furthermore, when trying to use concurrency, it might be useful to have a look at the TesseractPool
class:
TesseractPool pool = new TesseractPool();
The TesseractPool
class provides a pooling mechanism for running the OCR on multiple Tesseract
instances, without having to manually deal with all the different instances.
The class has the following methods:
string Read(byte[] data, int width, int height, int bytesPerPixel);
string Read(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel);
Task<string> ReadAsync(byte[] data, int width, int height, int bytesPerPixel, int rectX, int rectY, int rectWidth, int rectHeight);
And when either of the aforementioned image processing bridging libraries are present:
string Read(Image image);
string Read(Image image, Rectangle rectangle);
Task<string> ReadAsync(Image image);
Task<string> ReadAsync(Image image, Rectangle rectangle);