mloesch/sickle

AttributeError when harvesting OAI records without a metadata child

mrmiguez opened this issue · 1 comments

Our Islandora repository publishes collection records along side item records. The collection records have a <header> child but not a <metadata> child, raising an AttributeError when Sickle harvests them.

Example collection record: http://fsu.digital.flvc.org/oai2?verb=GetRecord&identifier=oai:fsu.digital.flvc.org:fsu_avc50&metadataPrefix=mods

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2020-04-08T13:25:58Z</responseDate>
  <request>http://fsu.digital.flvc.org/oai2</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:fsu.digital.flvc.org:fsu_avc50</identifier>
        <datestamp>2019-02-27T19:54:49Z</datestamp>
        <setSpec>fsu_stucamplifemain</setSpec>
      </header>
    </record>
  </GetRecord>
</OAI-PMH>

Python example:

from sickle import Sickle

h = Sickle("https://fsu.digital.flvc.org/oai2")

# item record works
rec1 = h.GetRecord(identifier="oai:fsu.digital.flvc.org:fsu_666", metadataPrefix='mods')
# collection record fails
rec2 = h.GetRecord(identifier="oai:fsu.digital.flvc.org:fsu_avc50", metadataPrefix='mods')

A try/except block in sickle.models.Record fixes the issue.

    def __init__(self, record_element, strip_ns=True):
            # ...snipped...
            try:
                self.metadata = xml_to_dict(
                    self.xml.find(
                        './/' + self._oai_namespace + 'metadata'
                    ).getchildren()[0], strip_ns=self._strip_ns)
            except AttributeError:
                self.metadata = None

According to the OAI-PMH specification the metadata XML element is not optional:
http://www.openarchives.org/OAI/openarchivesprotocol.html#Record

I suggest that you register your own record implementation as described here:
https://sickle.readthedocs.io/en/latest/customizing.html