/tesseractMRZ

Ready-to-use MRZ / MRTD (Machine-readable zone/travel documents) dataset and models for tesseract v4

BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

!!New!!
SDK to detect and recognize MRZ/MRTD released at https://github.com/DoubangoTelecom/ultimateMRZ-SDK
If you're looking for information on how to parse or validate MRZ data check here and here.


The dataset

The dataset contains more than #7 thousands images (.tif) with ground truth (.gt.txt) from Google image augmented with few synthetic data.

The dataset is ready to be used to train with Tesseract v4.

The models

If you're lazy and don't want to train the model by yourself then, try the ones under tessdata_best (float-model) or tessdata_fast (int-model) folders.

Testing the accuracy

You can check how accurate the MRZ model is at https://www.doubango.org/webapps/mrz/

You may also be interested in our Magnetic ink character recognition (MICR E-13B & CMC-7) implementation at https://github.com/DoubangoTelecom/tesseractMICR with online demo at https://www.doubango.org/webapps/micr/

Getting help

To get help please check our discussion group or twitter account