/kraken_to_pagexml

convert Kraken Segmentation output to pagexml

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

kraken_to_pagexml

Script that converts Kraken json segmentation output into a PageXML.

Possible workflow

Image binarization

Put your scans in a folder and run:

for i n *.png
   do kraken -i $i ${i/png/bin.png} binarize
   done

Image segmentation

for i in *.bin.png
    do kraken -i $i ${i/bin.png/json} segment -bl
    done

Create pagexml

Now run the script:

python kraken_to_pagexml.py *.json

The PageXML files can further be processed with, for example, LAREX or nashi