Define design patterns using standard format
Opened this issue · 42 comments
- create yaml file for each DP, see for example uberon patterns
- set up pipeline to use dosdp-tools to generate TSVs
- consider changing pipeline to derive portions of ontology direct from CSVs
UPDATED Let's punt on this for now:
- align with HBP, e.g.SciCrunch/NIF-Ontology@ebb80d8#commitcomment-19578799 cc @tgbugs
This was auto-generated, can be done as a starting point:
https://github.com/cmungall/owl_patternizer/tree/master/examples/cl
This is probably best done after #533
I will work on this after #533 is closed.
@addiehl needs to add new classes. I am recommending he just goes ahead and adds classes in Protege for now. But post any questions about patterns here.
Remember to work in PRs
actually @nicolevasilevsky there is nothing to stop your working on a PR with the yaml for now
Ok! I'll go ahead and work on it then. :)
@cmungall I am doing PRs to your repo (see https://github.com/cmungall/owl_patternizer/pulls), should I move these new patterns over to this repo (CL)?
@cmungall I reviewed all the patterns in your repo and made some minor edits and did PRs.
Should I create additional patterns for CL, or will you do so via your auto-generated method (which is very cool!)
Let me know if I can do anything to help with the fourth point. A bit more documentation can be found in https://github.com/SciCrunch/NIF-Ontology/blob/master/docs/Neurons.md and https://github.com/tgbugs/pyontutils/blob/master/docs/NeuronLangExample.ipynb.
@nicolevasilevsky did we make any progress on this?
I haven't worked on this in a while - is it high priority?
Looks like Nico added the templates for the templates to this repo here:
https://github.com/obophenotype/cell-ontology/tree/master/src/patterns/dosdp-pattern-workshop
And it looks like I got started on a couple patterns.
I can work on this further, if it is high priority, let me know.
yes, that is where they came from. Got it - I will review these.
Should these be moved out the dosdp-pattern-workshop folder and into the patterns folder? I don't think we are planning another dosdp workshop at the moment.
cc @matentzn
dosdp-pattern-workshop
was has nothing to do with any workshop :) it just means the patterns are works in progress and should not be used until finalised. So yes, judgement needs to be applied.. We can finalise these in the workshop folder and then move them over to dosdp-patterns when they are ready to be reviewed. I can help!
sounds great, thanks @matentzn
@nicolevasilevsky - are we using a standard label or GitHub project to track these?
Naming conventions. Looks like we are using camelCase with a leading lowercase. Let's keep doing this for now for consistency, but I suggest later renaming to use snake_case.
No, we don't have a label or GitHub project, but I will create both.
uPheno uses camelCase with a leading lowercase (see examples here). Personally, I think it would be nice to be consistent with uPheno.
I didn't know this was called snake_case!
Looking back now I would have preferred snake_case, but too much effort now. I will consider this as part of a big general review. Lets do camel case for now.
Why would you need a separate repo? Like a place that keeps track of all patterns anywhere? We should definitely use a standard tagging system. Is the purpose to identify all tickets and pull requests that relate to the definition/design of patterns? If so, I would suggest to use either pattern
or dosdp
. What do you think? Do we need anything more fine grained?
@matentzn should we move these patterns into a different folder called dosdp-patterns?
I created a label called pattern
. We have a similarly named label in Mondo
All actual patterns should be in dosdp-patterns
directory. Anything that is not (yet) explicitly intended to be used as a pattern should not.. So yes! When you finalise a pattern, always move it to dosdp-patterns!
I'm saying for ticket organization don't do BOTH labels
AND projects
got it - we just have a label now. No project.
Ok gotcha. Then I would favour labels over projects. Projects is more useful for larger complex projects imo..
Not sure why you are so mean to this poor directory of patterns in progress
😄 But if you want to do it right then @nicolevasilevsky, delete all patterns in the in-progress dir; delete the in-progress dir and make draft pull requests for all of them (draft while in draft state, undraft when ready for review).
Just kidding :) no problem!
I moved the patterns to PRs, please review: #718
Nicole, go ahead and merge, I made lots of comments on things that need fixed but easier to do post-merge, does no harm to have duff patterns in for now
@matentzn - what should our strategy be for keeping the derived TSVs up to date - GH actions?
the simplest thing to do would be to run the matching as part of the release, similar to DOSDP generate like this:
- we create a new component to cl,
components/dosdp-annotations.owl
- for the make goal generating that component, we run dosdp-query on all patterns over the ontology
- From the generated tsvs, we generate the tags (sets of annotations like:
<http://purl.obolibrary.org/obo/CL_000000> :dosdp-pattern <http://purl.obolibrary.org/obo/cl/patterns/abnormalCell.yaml>
These tags go into components/dosdp-annotations.owl
.
components/dosdp-annotations.owl
is imported into cl-edit.owl
The normal CL release process continues.
What do you think? Good enough?
Great - let's discuss the modeling in another forum as not specific to CL
are there more standard properties we can use?
https://lov.linkeddata.es/dataset/lov/terms?q=implements
or maybe implement a proper vocabulary for templates?
One thing I'm unsure about with this workflow is that terms will be annotated automatically with whether they conform to a pattern—but this will not help to know whether a term was specifically intended to implement a pattern, and whether it now does or doesn't.
There are two use cases:
- provenance: these should go on the generated statements themselves as axiom annotations.
- pattern conformance (this is what Chris refers to as inferred, but the word does not seem right; I prefer
conforms
, like dc:conformsTo) - pattern generated. This explicitly states that a statement was generated from a pattern (which implies conformance), for that I am liking: https://www.w3.org/ns/prov#wasGeneratedBy
- pattern conformance (this is what Chris refers to as inferred, but the word does not seem right; I prefer
- quality control: here I want to be able to quickly do some sparql checks over terms that are tagged in some way with a meta category. For example, I want to say: terms conforming to pattern A should not be a subclass of terms conforming to pattern B. I do not want to deal with annotation assertions to achieve this. I want to use a very general tag here as well; I may use DOSDP to assert this
tag
, or a simple SPARQL match/update. I want to use this to warn the editor of a term right away that ther term conforms to some "category of things", which they can review. I want to combine tags to complex QC queries. For this, I could use anything; including rdfs:comment; or a bespoke OMO (IAO) property likeqcTag
or something along these lines.
Yes that sounds a great way to distinguish stuff. I guess we are all not too worried about bloating our ontologies with this kind of stuff - I am not.
Some further thoughts from a disussion with @dosumis
- Do we need a date stamp to indicate when the match was run?
- Should this 3 level tagging (intendedToconform, conforms, generatedBy) by on axiom level only or also term level? For my purposes, I think axiom level is enough, and I can use something more leightweight on term level to simplify my QC checks.
I regenerated:
https://github.com/cmungall/owl_patternizer/tree/master/examples/cl
This fixes the bug where uberon/go etc terms were included in the generalization
see above for caveats. These are autogenerated and to be used as seeds. Use judgment
given discussions here: https://docs.google.com/document/d/1XvMbNvr0FEsdqGhg79BYCYEHSqUxRHMcvhbGizEAht8/edit#bookmark=id.699u5qrobewr
tech group will put this on the backburner
please place back in the tech board when more discussion on how best to implement this is done