block segmentation: overlaps and quality of prebuilt models

Question

block segmentation: overlaps and quality of prebuilt models

bertsky opened this issue 4 years ago · 0 comments

Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.

Here's how I gradually worked to isolate the problem.

using default 0.9 confidence threshold:

a	b

using lower 0.5 confidence threshold:

a	b

using default 0.9 confidence threshold, but annotating a polygon from the mask:

a	b

using lower 0.5 confidence threshold, but annotating a polygon from the mask:

a	b

using lower 0.5 confidence threshold, but annotating a polygon from the mask, and doing non-maximum suppression and other post-processing (like checking for containment):

a	b

using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass):

a	b

using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass), and doing non-maximum suppression and other post-processing (like checking for containment):

a	b

So all these refinements seem crucial.

But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.

Hence, inevitably, we need to retrain this.

@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?