other MIME types
Closed this issue · 2 comments
bertsky commented
Without digging, I am not sure why exactly, trying to open the PAGE-XML view on a workspace with ALTO files (text/xml
) gives this:
File "ocrd_browser/view/base.py", line 66, in <lambda>
configurator.connect('changed', lambda _source, *value: self.config_changed(name, value))
File "ocrd_browser/view/xml.py", line 50, in config_changed
self.reload()
File "ocrd_browser/view/base.py", line 86, in reload
self.current = self.document.page_for_id(self.page_id, self.use_file_group)
File "ocrd_browser/model/document.py", line 356, in page_for_id
image, _, _ = self.workspace.image_from_page(pcgts.get_Page(), page_id)
File "ocrd/workspace.py", line 384, in image_from_page
page_image = self._resolve_image_as_pil(page.imageFilename)
File "ocrd/workspace.py", line 295, in _resolve_image_as_pil
pil_image = Image.open(image_filename)
File "PIL/Image.py", line 2930, in open
raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file 'FULLTEXT/FILE_0001_FULLTEXT'
Looks like it tried to interpret this as an image (and make a PAGE-XML for it).
hnesk commented
Sorry, I can't reproduce that. Do you have an example workspace?
bertsky commented
Do you have an example workspace?
I do:
ocrd workspace clone -a "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-39946221X-18560530"
browse-ocrd mets.xml
(Here, FULLTEXT
contains ALTO files correctly specified as text/xml
, which our new document.page_for_id
tries to pick up as PAGE-XML. However, with the current version I don't see the above crash anymore – now the PageView and TextView simply forbid selecting any fileGrps.)