owlcollab/owltools

Using make-species-subset command to extract plant specific terms from go-plus.owl

wkpalan opened this issue · 9 comments

I am trying to follow steps mentioned in #165 to make a plant specific obo file using the following command.

 owltools go-plus.owl --reasoner hermit --make-species-subset -t NCBITaxon:33090 -o -f obo plant.obo

This does work and produce a plant.obo file, and I do not encounter memory errors, but there are multiple warnings output by the command. I have given an example of the warnings I see.

2019-05-13 16:44:47,724 WARN  (ChangeIndexingProcessor:66) [reasoner.indexing.axiomIgnored]ELK does not support ObjectAllValuesFrom. Axiom ignored: 
SubClassOf(<http://purl.obolibrary.org/obo/GO_0048838> 
ObjectAllValuesFrom(<http://purl.obolibrary.org/obo/RO_0002162> 
<http://purl.obolibrary.org/obo/NCBITaxon_33090>))

I was wondering what this warning means. The GO term itself is plant specific, but does not show up in the plants.obo file that was produced by the command that was run above

id: GO:0048838
name: release of seed from dormancy
namespace: biological_process
def: "The process in which the dormant state is broken in a seed. 
Dormancy is characterized by a suspension of physiological activity 
that can be reactivated upon release." 
[GOC:dph, GOC:jid, GOC:tb, ISBN:9781405139830]
is_a: GO:0010162 ! seed dormancy process
is_a: GO:0097438 ! exit from dormancy

The owl section for GO:0048838 is shown below. It seems that there are two restrictions only_in_taxon and in_taxon, and one of those has the allValuesFrom mentioned in the warning. Is there a workaround that I am missing?

<owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0048838">
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0010162"/>
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0097438"/>
    <rdfs:subClassOf>
        <owl:Restriction>
            <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002160"/>
            <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_33090"/>
        </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
        <owl:Restriction>
            <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002162"/>
            <owl:allValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_33090"/>
        </owl:Restriction>
    </rdfs:subClassOf>
    <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The process in which the dormant state is broken in a seed. Dormancy is characterized by a suspension of physiological activity that can be reactivated upon release.</obo:IAO_0000115>
    <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">biological_process</oboInOwl:hasOBONamespace>
    <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GO:0048838</oboInOwl:id>
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">release of seed from dormancy</rdfs:label>
</owl:Class>
<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/GO_0048838"/>
    <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
    <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The process in which the dormant state is broken in a seed. Dormancy is characterized by a suspension of physiological activity that can be reactivated upon release.</owl:annotatedTarget>
    <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GOC:dph</oboInOwl:hasDbXref>
    <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GOC:jid</oboInOwl:hasDbXref>
    <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GOC:tb</oboInOwl:hasDbXref>
    <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ISBN:9781405139830</oboInOwl:hasDbXref>
</owl:Axiom>

@wkpalan this is a warning from the ELK reasoner, and it is nothing to worry about. GO uses a few OWL axioms that are not part of the OWL EL profile, which describes the kind of reasoning that ELK can do. It just means that ELK will ignore these axioms for the purpose of the classification it is able to do.

I see you specified "hermit", but without looking into the owltools code, I'm not sure why it's using ELK instead of HermiT. However if you really try to classify go-plus with HermiT, you will be waiting a VERY long time.

Thanks @balhoff. I will ignore the warning for now.

I'm sorry, I had pasted the wrong command. I used elk reasoner first

owltools go-plus.owl --reasoner elk --make-species-subset -t NCBITaxon:33090 -o -f obo plant.obo

then I tried to use hermit to see if that would work. As you mentioned that command has not finished till now and using all of allocated memory of 60GB.

I am still unclear why the specific GO term is not included in the plant.obo output file.

@wkpalan do you get any other errors when making the obo file, such as "duplicate labels"? I wonder if you are experiencing something related to this issue: geneontology/go-ontology#17263

I am currently checking it. I'll get back to you once I have results.

I ran the above command and output to two different formats obo and owl.

obo

owltools go-plus.owl --reasoner hermit --make-species-subset -t NCBITaxon:33090 -o -f obo plant.obo > obo.out.txt 2>&1

obo.out.txt

owl

owltools go-plus.owl --reasoner hermit --make-species-subset -t NCBITaxon:33090 -o -f owl plant.owl > owl.out.txt 2>&1

owl.out.txt

Neither command produced errors, but neither command includes the GO term from above in the output

I downloaded the latest go-plus.owl from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl and now I get the same error mentioned in geneontology/go-ontology#17263

@wkpalan I am working to clean up the double labels in the GO imports, but in the meantime you can get an OBO file by using --no-check:

owltools go-plus.owl --reasoner hermit --make-species-subset -t NCBITaxon:33090 -o -f obo --no-check plant.obo > obo.out.txt 2>&1

Hi Jim, I started a new job so didn't have time to follow up, but I am running the command currently. How much memory and how long do you think this would take? I have a server which has 120GB and a maximum walltime of 5 days with 16 CPU cores. First time around it ran for a 24 hours and failed.

@wkpalan were you using hermit? I had that in my example command because it was in yours, but I think you will have better success with elk.