AttributeError when harvesting OAI records without a metadata child
mrmiguez opened this issue · 1 comments
mrmiguez commented
Our Islandora repository publishes collection records along side item records. The collection records have a <header>
child but not a <metadata>
child, raising an AttributeError when Sickle harvests them.
Example collection record: http://fsu.digital.flvc.org/oai2?verb=GetRecord&identifier=oai:fsu.digital.flvc.org:fsu_avc50&metadataPrefix=mods
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2020-04-08T13:25:58Z</responseDate>
<request>http://fsu.digital.flvc.org/oai2</request>
<GetRecord>
<record>
<header>
<identifier>oai:fsu.digital.flvc.org:fsu_avc50</identifier>
<datestamp>2019-02-27T19:54:49Z</datestamp>
<setSpec>fsu_stucamplifemain</setSpec>
</header>
</record>
</GetRecord>
</OAI-PMH>
Python example:
from sickle import Sickle
h = Sickle("https://fsu.digital.flvc.org/oai2")
# item record works
rec1 = h.GetRecord(identifier="oai:fsu.digital.flvc.org:fsu_666", metadataPrefix='mods')
# collection record fails
rec2 = h.GetRecord(identifier="oai:fsu.digital.flvc.org:fsu_avc50", metadataPrefix='mods')
A try/except block in sickle.models.Record
fixes the issue.
def __init__(self, record_element, strip_ns=True):
# ...snipped...
try:
self.metadata = xml_to_dict(
self.xml.find(
'.//' + self._oai_namespace + 'metadata'
).getchildren()[0], strip_ns=self._strip_ns)
except AttributeError:
self.metadata = None
mloesch commented
According to the OAI-PMH specification the metadata
XML element is not optional:
http://www.openarchives.org/OAI/openarchivesprotocol.html#Record
I suggest that you register your own record implementation as described here:
https://sickle.readthedocs.io/en/latest/customizing.html