Realize an empty publication date if METS header is absent instead of failing with a Python error
tboenig opened this issue · 6 comments
Hi @wrznr,
I use your program with data from sbb.
Here an example:
mm2tei -o "https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X" >test.tei.xml
A other example from sub goettingen
mm2tei -o "https://gdz.sub.uni-goettingen.de/mets/PPN228873541.mets.xml" >test.tei.xml
Here we find the same ssl problem.
Is the ssl problem a problem on ssb side or a problem in your program?
Hi @tboenig, could you pls. post some kind of error message to make it easier to get an idea of the error?
here the ssb error:
Traceback (most recent call last):
File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/lib/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 974, in send
self.connect()
File "/usr/lib/python3.6/http/client.py", line 1415, in connect
server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 817, in __init__
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1077, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 27, in cli
f = urlopen(mets)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.6/urllib/request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
sys.exit(cli())
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 29, in cli
f = open(mets, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X'
and here the sub goettingen error:
sorry is not the same ssl error
Traceback (most recent call last):
File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
sys.exit(cli())
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 35, in cli
mets.fromfile(f)
File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 112, in fromfile
self.__spur()
File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 233, in __spur
self.encoding_date = header.get_CREATEDATE().isoformat()
AttributeError: 'NoneType' object has no attribute 'get_CREATEDATE'
The former problem is most likely a problem at the host (SBB) or your own institution. Sorry.
The latter problem is caused by the missing metsHdr
element in the METS file you want to process (cf. https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263). The METS file from Göttingen contains no information when it was created. But such information is mandatory for valid DTABf. If you have ideas on how to fix this, I will gladly implement them.
Hi @wrznr,
If you have ideas how to fix it, I will be happy to implement them.
my suggestion:
- ignore the empty or missing metsHdr and make an empty
<date type="publication"/>
or an error message on cli, i.e. the mets file is not valid. I think a combination would be ideal.
@tboenig I have difficulty implementing these fallbacks/error signals for missing headers, because I cannot find exact documentation of DTAbf and TEI proper.
For example, one of the dependent elements of metsHdr is the mets:agent, which is used for encodingDesc:
mets-mods2tei/mets_mods2tei/api/mets.py
Line 245 in fc7b0f7
(I don't know why we throw away all but the first agent and all but its name, but granted.)
This information usually ends up in simple p
elements:
mets-mods2tei/mets_mods2tei/api/tei.py
Lines 471 to 473 in fc7b0f7
Now, according to DTAbf there is supposed to be an intermittent editorialDecl
here. But the only reference I can find on that is in the (IIUC) Examples schema.
So what is the correct representation here, and what should I put in as a fallback in case the metsHdr is missing?