list index out of range on non-XML fileGrp
bertsky opened this issue · 5 comments
When I try to open a fileGrp which contains only images, but no PAGE files, and if it does not contain the original image (like OCR-D-IMG
) but only derived images (referenced by other fileGrps via AlternativeImage, e.g. OCR-D-IMG-BIN
), then browse-ocrd collapses with the following error:
Traceback (most recent call last):
File "ocrd_browser/view/base.py", line 66, in <lambda>
configurator.connect('changed', lambda _source, *value: self.config_changed(name, value))
File "ocrd_browser/view/images.py", line 40, in config_changed
self.reload()
File "ocrd_browser/view/images.py", line 69, in reload
self.pages.append(self.document.page_for_id(display_id, self.use_file_group))
File "ocrd_browser/model/document.py", line 233, in page_for_id
pcgts = self.page_for_file(page_files[0])
IndexError: list index out of range
(Otherwise display of derived page images works fine.)
Oh, and could you please set the ocr-d
topic for this repo to increase its visibility with OCR-D users?
thanks!
Hmm, it does work for image-only fileGrps now, but still happens if some fileGrp misses a pageId (as soon as I step to that page). Re-open or new issue?
Ah, ok. So the fileGrp has no PAGE-XML entry for a particular page_id, right?
So the fileGrp has no PAGE-XML entry for a particular page_id, right?
Exactly! (Happens when using -g
selectively during processing, or adding pages to a workspace later-on.)
Can you check with the latest commit fd02515 ?
I found it difficult to reproduce a crash, I tried it here
browse-ocrd/tests/model/test_document.py
Line 92 in fd02515
so I'm not sure if this is the fix.
Browsing to a fileGrp without PAGE-XML should now give you a warning:
11:04:37.785 WARNING ocrd_browser.model.document - No PAGE-XML and no image for page 'PHYS_0020' in fileGrp 'OCR-D-GT-PAGE'
and that should be it.
If the problem still persists, an example workspace would be nice.
Can you check with the latest commit fd02515 ?
Yes, this works perfectly – thanks a lot!
I found it difficult to reproduce a crash, I tried it here
browse-ocrd/tests/model/test_document.py
Line 92 in fd02515
Indeed, this is a correct way to simulate missing pages. (You also could have created that workspace dynamically, by removing the page ad-hoc in the test, but never mind.)
Browsing to a fileGrp without PAGE-XML should now give you a warning:
It does exactly that.