hnesk/browse-ocrd

list index out of range on non-XML fileGrp

bertsky opened this issue · 5 comments

When I try to open a fileGrp which contains only images, but no PAGE files, and if it does not contain the original image (like OCR-D-IMG) but only derived images (referenced by other fileGrps via AlternativeImage, e.g. OCR-D-IMG-BIN), then browse-ocrd collapses with the following error:

Traceback (most recent call last):
  File "ocrd_browser/view/base.py", line 66, in <lambda>
    configurator.connect('changed', lambda _source, *value: self.config_changed(name, value))
  File "ocrd_browser/view/images.py", line 40, in config_changed
    self.reload()
  File "ocrd_browser/view/images.py", line 69, in reload
    self.pages.append(self.document.page_for_id(display_id, self.use_file_group))
  File "ocrd_browser/model/document.py", line 233, in page_for_id
    pcgts = self.page_for_file(page_files[0])
IndexError: list index out of range

(Otherwise display of derived page images works fine.)

Oh, and could you please set the ocr-d topic for this repo to increase its visibility with OCR-D users?

309b83c

thanks!

Hmm, it does work for image-only fileGrps now, but still happens if some fileGrp misses a pageId (as soon as I step to that page). Re-open or new issue?

hnesk commented

Ah, ok. So the fileGrp has no PAGE-XML entry for a particular page_id, right?

So the fileGrp has no PAGE-XML entry for a particular page_id, right?

Exactly! (Happens when using -g selectively during processing, or adding pages to a workspace later-on.)

hnesk commented

Can you check with the latest commit fd02515 ?
I found it difficult to reproduce a crash, I tried it here

def test_page_for_id_with_nothing_for_page_and_fileGrp(self):

so I'm not sure if this is the fix.

Browsing to a fileGrp without PAGE-XML should now give you a warning:
11:04:37.785 WARNING ocrd_browser.model.document - No PAGE-XML and no image for page 'PHYS_0020' in fileGrp 'OCR-D-GT-PAGE'
and that should be it.

If the problem still persists, an example workspace would be nice.

Can you check with the latest commit fd02515 ?

Yes, this works perfectly – thanks a lot!

I found it difficult to reproduce a crash, I tried it here

def test_page_for_id_with_nothing_for_page_and_fileGrp(self):

Indeed, this is a correct way to simulate missing pages. (You also could have created that workspace dynamically, by removing the page ad-hoc in the test, but never mind.)

Browsing to a fileGrp without PAGE-XML should now give you a warning:

It does exactly that.