Extracting Text-Lines from images using the trained model
Closed this issue · 3 comments
@TobiasGruening Thank you for your hard work.
How can we use the trained model to extract the text-lines?
@TobiasGruening This is really amazing.
Along the same lines as the above question:
Is there any way to print out the coordinates of the recognized text segments, instead of plotting the text segments on new images for display.
I've been trying to modify the inference
method, but I can't quite figure how to print
out the coordinates for each recognized line segment. Any hints would be greatly appreciated.
Hey, sorry for the delayed answer.
The ARU-Net framework performs some pixel labeling, it does not directly parametrize the textlines, i.e.,
coordinate information is not available out-of-the-box.
An sophisticated approach to get the text lines given the raw ARU-Net output is described in
"A Two-Stage Method for Text Line Detection in Historical Documents". The ARU-Net constitutes just the first stage. Sadly, the second stage is (due to licensing issues) not OpenSource :(
However, one could think about binarizing the ARU-Net output and to calculate its skeleton.
These are standard image processing techniques and various OpenSource solutions are available.
Using this along with some polynomial regression stuff should give you quite reasonable parametrized baseline representations which could be used to calculate surrounding polygons. These is typically the desired input for adjacent OCR frameworks...
Don't hesitate to ask further questions, I hopefully will respond faster :)
You can get baseline coordinates using this repository.
And then you can extract lines using pageLineExtractor.