New translation of GAF to OWL
cmungall opened this issue · 4 comments
The goal here is to produce a dataset from GO for submission to the Ontology Reasoner Evaluation, using a combo of ontologies and GAFs. As currently GAF lines are independent this limits the interesting connectivity of the data (this would be served by the conversion of a biopax resource to lego-owl).
Current situation is we have two paths for converting gaf2owl
- Using
--gaf2lego
, which is specified here http://geneontology.org/experimental/lego/docs/lego-from-gaf.txt, and uses the lego class expression model. Let's call thislego-CE
- Using
--gaf2owl
which has a hodgepodge of command line options that allow for complex mixing and matching of different models. Underlying code is GAFOWLBridge. This translation is used in the current jenkins gaf checks (e.g. for taxon and c16 violations). Let's call thisGAF-OWL
For the purposes of the ORE, we think it best to focus efforts on a lego individual model, call this lego-I
. Heiko will write this from scratch, the specification is essentially the composition of the lego-CE
translation and the mapping to OWL individual model for LEGO. This could probably be restated in a simpler form, todo.
To make things more interesting, we will include a strong translation of negation that basically "blocks out" all individuals that could instantiate that pattern. E.g. if the GAF says "foo NOT in nucleus", we will map this to
foo DisjointWith enables some (occurs_in some nucleus)
Let's call this lego-I-strongNeg
This means if there is a separate positive GAF line that says "foo in nucleus" it would generate
:1 Type foo
:1 enables :2
:2 occurs_in :3
:3 Type go:nucleus
Which would yield an inconsistency, since the class axiom above effectively "blocks out" this ABox pattern for foos (but note: not detected in Elk as it uses OPEs).
This is I think interesting from the POV of the ORE where we want to benchmark detecting inconsistencies. But this is probably too strong for a general purpose GO translation (unless the negation axioms are isolated), because GO annotations naturally have these "inconsistencies".
We could submit two versions for the ORE - one where the challenge is to find the inconsistencies, the other with negation removed where the goal is to answer some query.
As for an interesting query; I understand we can use SPARQL. We have less connectivity in a DL query since the individual graphs are disconnected, but the overall ABox+TBox graph is more connected.
Query of the form "which geneProducts do X" in lego-I
have the form:
SELECT ?geneProductClass
WHERE { ?geneProductIndividual a ?geneProductClass .
?geneProductIndividual ?...}
(since we essentially have separate disconnected lego models, where geneProducts (which may be genes in the naive translation) are united by the class they instantiate (e.g. "Mouse Shh" is a class)).
cc @dosumis
We could submit two versions for the ORE - one where the challenge is to find the inconsistencies, the other with negation removed where the goal is to answer some query.
Why not submit a version with explicit NOTs for DL classification/querying?
Well it may take a bit of work to generate a version that has both strong negation and is coherent. I don't think we'd want to submit an incoherent ontology for anything other than a coherence test
(coherent = all classes satisfiable, and no inconsistencies)
There is now a new owltools command:
--gaf-lego-indivduals -o output [--add-line-number]
The --add-line-number
flag adds an annotation to (some) axioms with the corresponding GAF line number. That is mostly helpful when you want to trace issues in the translation.
The new command is provisional and intended for experimenting and we will at some point use the originally requested one --gaf2lego
Closing the issue for now until specs change or bugs are found.