ocropy segment does not handle input files properly
Closed this issue · 2 comments
ocrd-cis-ocropy-segment does not segment any image files.
Looking at the code in ocrd_cis/ocropy/segment.py line 220 looks wrong:
for (n, input_file) in enumerate(self.input_files):
Shouldn't this be:
for (n, input_file) in enumerate(self.workspace.mets.find_files(fileGrp=self.input_file_grp)):
As well as line 113 seems weird:
if hasattr(self, 'output_file_grp'):
try:
self.output_file_grp, self.image_file_grp = self.output_file_grp.split(',')
except ValueError:
self.image_file_grp = FALLBACK_FILEGRP_IMG
LOG.info("No output file group for images specified, falling back to '%s'", FALLBACK_FILEGRP_IMG)
Looking at the code in ocrd_cis/ocropy/segment.py line 220 looks wrong:
for (n, input_file) in enumerate(self.input_files):Shouldn't this be:
for (n, input_file) in enumerate(self.workspace.mets.find_files(fileGrp=self.input_file_grp)):
These 2 are the same. But recently, the input_files
property has become even more, because it must also differentiate between PAGE and image files in the same fileGrp, for the same pageId.
So: no, this is the correct pattern, and since OCR-D/spec#164 we must use input_files
.
As well as line 113 seems weird:
if hasattr(self, 'output_file_grp'): try: self.output_file_grp, self.image_file_grp = self.output_file_grp.split(',') except ValueError: self.image_file_grp = FALLBACK_FILEGRP_IMG LOG.info("No output file group for images specified, falling back to '%s'", FALLBACK_FILEGRP_IMG)
This used to be the correct pattern for processors that write AlternativeImages. (The conditional was necessary because there are other non-processing contexts like --help
which have no output_file_grp
defined.)
After OCR-D/spec#164 we write them to the same output fileGrp, so this is not needed anymore (see #57).
Ok. Thanks for the clarification