geneontology/syngo2lego

Comments on first pass of syngo OWL

cmungall opened this issue · 13 comments

The following are not necessarily wrong, just observations of differences. cc @balhoff @kltm

Aside: Re: imports/declarations. This has always been awkward. In noctua-models we always store an import to go-lego.owl. This interferes with agile parsing using the OWLAPI (need to either load all of lego.owl or rewire to more minimal ontology using catalogs). But we either need this, or we need the declarations. Maybe this can be revisited now we are in blazegraph..

  • Ontology IRI not provided
    • Bug - FIXED
  • No import declaration:
    • deliberate right now as would make files a pain to work with. I'll add the code and make addition of this statement an optional switch so that I can play with test OWL files in Protege.
  • injects TBox and RBox declarations into model (necessary because of lack of imports)
    • If you just mean entity declarations - isn't this down to the choice of format (OWL-API adds them for non-imported entities when saving in OWL formats, but not in ttl?)
  • PR:P20020 rather than UniProtKB:P20020
    • Fixed.
  • Invents some OWL classes (http://purl.obolibrary.org/obo/Glutamatergic, http://purl.obolibrary.org/obo/target_overexpression)
    • Some evidence types still to be mapped in SynGO (plus test json a bit old). So this issue will be fixed in the actual JSON we use for loading.
  • contributor is not ORCID
    • Needs discussion as these models are partly curated by experts. We should probably be asking all of them for ORCIDs. Can bring up in next SynGO call.
  • uses lego:evidence (we switched to SEPIO didn't we @balhoff? https://github.com/geneontology/minerva/blob/master/specs/owl-model.md#axiom-annotations-and-evidence)
    • easy to switch. Guess I was using an old LEGO owl file as a model.
kltm commented

Just FYI, while ORCIDs are highly encouraged, they not actually a requirement of the system.

@cmungall Re evidence model: This model has today's date on it and uses the old schema. So maybe I should keep it as_is and switch over when noctua does?

New example output file in turtle format with import statement:

https://github.com/geneontology/syngo2lego/blob/master/example_output/SynGO_132.ttl

Only includes genuine OBO classes. Still includes Class declarations. If those are a problem, I need advice on how to get rid of them.

You're right about the evidence property, I thought we'd made the switch.

I agree about the declarations-imports. I would actually prefer we did it your way. I don't think your way will cause any problems. Note that the imports will be injected back in when we do the backup save to noctua-models, and also when it is retrieved in-memory from blazegraph.

Primarily for aesthetic reasons, can we have the model/ontology IRI have no underscores and no ".owl" suffix? Really the only thing you need to conform to is having the URL start model.geneontology.org/ (this gets injected into the CURIE map at startup), but I think it's good to have consistency here.

I've dropped the .owl file extension. For parsing, it would be useful to keep a consistent separator between namespace and SynGO identifier. If not underscore, would a dash be OK?

What will the dataflow be? We have no way to prevent edits on these models. Do they 'belong' to Noctua once added?

kltm commented

It may be possible to add a "locked" annotation state. For now, it could be respected on the client; later, prevented from modification at the server end as well.
Not a lot of bandwidth to dig in at this point though.

Close, out of date?

Close, out of date?

Think so. Looks like superseded by geneontology/noctua-models#55. Have no idea if this turned into an ongoing import. Would be a great shame if not.