mittagessen/kraken

Is it necessary for `kraken.binarization.nlbin` to raise an error on blank images ?

Closed this issue · 2 comments

No major issue, but for now, kraken.binarization.nlbin raises an KrakenInputException('Image is empty') if a blank image is passed. The problem is that this step is often used as the beginning of segment and ocr, and can break batch processing if a single image is blank. Another shortcoming is that blank images will have no output in the end (say in a try/except-like loop), which leads to possible data mismatch.

A suggestion would be return the blank image and from it an empty output (but still an output).

It's mostly an artifact of the original ocropus code. The legacy segmenter that would absolutely require binarization couldn't deal with empty images so the best point to abort processing was in there.

I'm a bit loath to change its behavior now as binarization isn't really used anymore. The segmenter doesn't need it and (heavily degraded) material where it would be potentially helpful at recognition time doesn't work particularly well with the rather crude algorithm. May I ask how you're using it in your pipeline?

@mittagessen sorry for my (extremely) late reply ! I fully get your point, I was just mentioning this problem passing by. I was using binarization for experimental purposes (in my case to compare kraken's legacy line segmenter with blla). So basically the pipeline was image -> nlbin -> line segmenter :) But as I imagined, not a major issue.