The "replaced by" terms are not extracted in certain situations
tskir opened this issue · 3 comments
Consider two files from this EFO release: https://github.com/EBISPOT/efo/releases/tag/v3.42.0. efo.owl
is the whole ontology, while efo_otar_slim.owl
is a subsection of it used for the purposes of the Open Targets project.
Further consider an example term, http://www.orpha.net/ORDO/Orphanet_293843. In both files it is present, marked as obsolete, and a replacement is provided using a IAO_0100001 code. The replacement term is http://purl.obolibrary.org/obo/MONDO_0017398, which is also present in both files and is current (not marked as obsolete).
However, I observe different behaviour when trying to use the replaced_by
property between these two files. Using the following simple code:
for term in pronto.Ontology(ontology_filename).terms():
if term.id.endswith('293843'):
print(term.replaced_by)
I obtain the following results:
efo.owl
:TermSet({Term('MONDO:0017398', name='3MC syndrome')})
(as expected)efo_otar_slim.owl
:TermSet({})
(the term is not extracted)
I tried doing some debugging, but failed to reach a conclusive explanation of why this happens. It seems to me that the problem happens somewhere around this part:
pronto/pronto/parsers/rdfxml.py
Lines 414 to 427 in a0186ff
When running for efo_otar_slim.owl
and this particular term (Orphanet_293843), the attrib
dictionary is empty, hence both if/elif blocks do not get executed. The same attrib
dictionary is not empty when running with efo.owl
. I don't know quite why this happens.
I also noticed that one property of the efo_otar_slim.owl
file is that sometimes the replacement terms specified by the IAO_0100001 code lead outside of it: EBISPOT/efo#1595. I'm not sure if this is related to the behaviour observed in this issue or not.
@tskir the XML for http://www.orpha.net/ORDO/Orphanet_293843 is slightly different between those two files, specifically as it relates to replaced_by...
in "slim":
<obo:IAO_0100001>http://purl.obolibrary.org/obo/MONDO_0017398</obo:IAO_0100001>
in "full":
<obo:IAO_0100001 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://purl.obolibrary.org/obo/MONDO_0017398</obo:IAO_0100001>
since the "full" IAO_0100001 tag has an rdf:datatype
attribute, line 419 of rdfxml.py
evaluates to true, and the text
is added to replaced_by
. In the "slim", however, there are no attributes on the IAO_0100001 tag. So neither line 415 or 419 evaluate to true, and replaced_by
remains empty.
I think the right solution here is to modify line 419 to read
elif text is not None: