NCSU-Libraries/ocracoke

Why does the HocrResizer also resize the slope?

Closed this issue · 6 comments

As far as I see the hocr-resizer.rb just resizes all coordinates from bbox as well as the slope and constant term of the baseline. It makes sense that all distances (in x- and y-direction) are resizes. But shouldn't the slope as the quotient of y-changes and x-changes stay the same?

Yes, you may very well be right regarding baseline, and I believe I started out not touching these values at all. But leaving these values alone hocr-pdf misplaced words IIRC. I did not understand the documentation on baseline well enough to do a better calculation, so just guessed that this might help. This resizing worked well enough so that words were highlighted in the PDF. If you have suggestions on how to do this calculation correctly that would be appreciated.

I would say that the slope stays the same and all other values will be scaled.

For example look at:
basline-example

If we scale this image to 50 % then the height and width of the image will be only half of the values it was before, the x- and y-coordinates of the left upper corner and right lower corner of the box are halved, and the difference of the baseline to the box at the start will be only 9 px. But the angle (and therefore the slope) should stay the same under any rescaling. There would be a difference if the rescaling in x- and y-direction is different (trapezoid transformation).

Take it differently and think about driving up a mountain: the slope is the quotient of how much it goes up compared to how much you move horizontally. However, this stays the same also if you look at a miniature (scaled) model of the real world.

(You can try this out on the example given kba/hocr-spec#15 (comment), print the baseline into the image and then apply your rescaling algorithm and print again the baseline into the rescaled image.)

That makes sense. I've updated it with this commit: ddd9318

Thank you.

Actually, this should then result into

        elsif title_part.include?('baseline')
          # https://github.com/tesseract-ocr/tesseract/wiki/FAQ#how-to-interpret-hocr-baseline-output
          b, slope, constant_term = title_part.split(' ')
          slope = slope.to_f
          constant_term = constant_term.to_f * @pct
          "baseline #{slope} #{constant_term}"
        else
          # Style the same.
          title_part

i.e. scale constant term but not slope.

@zuphilip Thank you very much for your help here and generally improving the state of hOCR. I really appreciate it.

Great! I am happy to help building open source software.