Generate data (text -> ) with the following command.
python3 extract-lines.py <src text file> <output-dir>
<output-dir>
will be populated with a bunch of jpegs and text files (the labels).
Here's naming convention: xxx.<quality>.png
and xxx.txt
.
Generate character images (text image -> individual character images) with the following command.
python3 segment.py <src text image file> <output-dir>
<output-dir>
will contain an image for each text line of the input image with the format line<X>.<quality>.jpg
and an image for each character of the input image with the format line<X>.char<Y>.<quality>.jpg
.
python3 src/train.py <data-dir generated by extract-lines.py>
python3 src/infererence.py ocr.model <image of a single line>