yy20716/mondo-hpo-learner

Perform experiment on NCIT

Opened this issue · 4 comments

Use either ncit.owl or neoplasm.owl (@balhoff to advise)

unlike with the other experiments, there is no separate associations file. Everything is axiomatized within NCIT. The goal is to find in an unbiased way which object properties work best for classifying diseases.

dllearner won't work out the box, it uses instances as training data. We will want to materialize instance graphs for each subClassOf disease. There is some code in owltools that can be used but it may be better to write clean code in scala (or sparql?).

For each disease class

1. weaken existing equivalence axioms to subClassOf
e.g. C = G1 and R1 some Y2 and R2 some Y2 .. ==>
C subClassOf G1, R1 some Y1, ...

2. for each subClassOf axiom, translate recursively to object property assertions.

E.g. if we have

C subClassOf R some D
D subClassOf D2
D2 SubClassOf R2 some E..
==>
i1 a C
i1 R i2
i2 a D
i2 R2 i3
i3 a E

this would form the instance graph input

I would like to ask your suggestion if possible.

  • So I was working on pre-processing NCIT ontology (ncit.owl) to run DL-Learner over this ontology. What happens in the pre-process step is to (i) build up dummy instances based on the above suggestion, (ii) add the dummy instances into the original ncit.owl. Later the DL-learner reasons over this expanded ncit.owl. The problem is that these reasoning steps does not finish or halts due to the OOM even I used the pan server (even if I assigned 700G memory!). I am trying to see whether there is anything I can do.

  • I was initially using the ontology file on this page (https://github.com/ontodev/ncit-obo) and wonder whether there is any difference with the one in the obofoundary (http://obofoundry.org/ontology/ncit.html). I see that some URIs prefixes are different but wonder there are any other differences. I am trying to see if I could cut out irrelevant axioms in ncit.obo to reduce the size of ontology, i.e. is it okay to cut out ones such as synonym, def, and intersection_of? I wonder whether removing such ones affects the generation of subclassOf and someValuesFrom axioms in ncit.owl.

Any other suggestions are also appreciated.

p.s. Is it better to use neoplasm.owl instead? The problem is that it does not have any someValuesFrom axioms somehow.
p.s. maybe I do not need to add the dummy instances into the original ncit.owl anymore?

Just following up here after sending this on Skype—the ontology file to use is this one:

http://purl.obolibrary.org/obo/ncit/neoplasm-core.owl

ok, I was finally able to make DL-learner process this neoplasm ontology. I saw groups of many extra classes from GO and UBERON but did not list them in the index file for now. I uploaded the report files in this repository. I will work on building charts tomorrow.