althonos/pronto

The "replaced by" terms are not extracted in certain situations

tskir opened this issue · 3 comments

tskir commented

Consider two files from this EFO release: https://github.com/EBISPOT/efo/releases/tag/v3.42.0. efo.owl is the whole ontology, while efo_otar_slim.owl is a subsection of it used for the purposes of the Open Targets project.

Further consider an example term, http://www.orpha.net/ORDO/Orphanet_293843. In both files it is present, marked as obsolete, and a replacement is provided using a IAO_0100001 code. The replacement term is http://purl.obolibrary.org/obo/MONDO_0017398, which is also present in both files and is current (not marked as obsolete).

However, I observe different behaviour when trying to use the replaced_by property between these two files. Using the following simple code:

for term in pronto.Ontology(ontology_filename).terms():
    if term.id.endswith('293843'):
        print(term.replaced_by)

I obtain the following results:

  • efo.owl: TermSet({Term('MONDO:0017398', name='3MC syndrome')}) (as expected)
  • efo_otar_slim.owl: TermSet({}) (the term is not extracted)
tskir commented

I tried doing some debugging, but failed to reach a conclusive explanation of why this happens. It seems to me that the problem happens somewhere around this part:

elif tag == _NS["obo"]["IAO_0100001"]:
if _NS["rdf"]["resource"] in attrib:
iri = attrib[_NS["rdf"]["resource"]]
curie = curies.get(iri) or self._compact_id(iri)
termdata.replaced_by.add(curie)
elif _NS["rdf"]["datatype"] in attrib:
curie = curies.get(text) or self._compact_id(text)
termdata.replaced_by.add(curie)
else:
warnings.warn(
"could not extract ID from `IAO:0100001` annotation",
SyntaxWarning,
stacklevel=2,
)

When running for efo_otar_slim.owl and this particular term (Orphanet_293843), the attrib dictionary is empty, hence both if/elif blocks do not get executed. The same attrib dictionary is not empty when running with efo.owl. I don't know quite why this happens.

tskir commented

I also noticed that one property of the efo_otar_slim.owl file is that sometimes the replacement terms specified by the IAO_0100001 code lead outside of it: EBISPOT/efo#1595. I'm not sure if this is related to the behaviour observed in this issue or not.

@tskir the XML for http://www.orpha.net/ORDO/Orphanet_293843 is slightly different between those two files, specifically as it relates to replaced_by...

in "slim":

<obo:IAO_0100001>http://purl.obolibrary.org/obo/MONDO_0017398</obo:IAO_0100001>

in "full":

<obo:IAO_0100001 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://purl.obolibrary.org/obo/MONDO_0017398</obo:IAO_0100001>

since the "full" IAO_0100001 tag has an rdf:datatype attribute, line 419 of rdfxml.py evaluates to true, and the text is added to replaced_by. In the "slim", however, there are no attributes on the IAO_0100001 tag. So neither line 415 or 419 evaluate to true, and replaced_by remains empty.

I think the right solution here is to modify line 419 to read

     elif text is not None: