INCATools/dead_simple_owl_design_patterns

proposed design pattern design patterns / naming conventions

Opened this issue · 3 comments

rationale

names will be exposed in documentation for editors, curators, and general end-users who wish to understand the general patterns of the ontology, so it is important to use consistent, clear, non-ontologist language

these will also be exposed as headers in TSVs

many of the conventions that apply to programming languages and data models and schemas apply here

it is also useful to think of patterns as metaclasses, whose instances are owl classes. The pattern instances may also correspond to what scientists think of as 'entities'.

principles

always use meaningful names

  • don't use var names like v. Spell it out.
  • use terms a domain scientist would understand
  • use spaces, not underscores
  • never use camel case

exception: filename / IRI should use underscores not spaces. however, for human-readable labels change the underscores to spaces

vars should be named by relationship or role of range, not range itself

e.g. if an disease pattern has a variable to specify the location and the range is anatomical structure, call it location not anatomical structure

rationale: later on you may want a sub-pattern where you have a separate var with the same range

genus vars should be named non-generically

E.g disease by location, with 2 vars, one for the genus, the other for the location. Do not call the genus var 'disease'. call it something like

  • parent disease
  • disease group
  • disease class
  • ...

also consider something like "morphological type" if that describes the genus relation

rationale: just 'disease' is too broad. See previous principle. later we may add a sub-pattern that references another disease

name the pattern by the identity criteria

the set of vars that are used in the equivalence axiom constitute the compound key. these are the identity crietria

e.g a pattern for subtyping leiomyosarcomas by location. do not call this 'leiomyosarcoma'. call it 'leiomyosarcoma by location'

rationale: we may later add leiomyosarcomas subtyped by gene. we can't have two called 'leiomyosarcoma'

in general a good pattern is to name the pattern by the sequence of elements in the equivalence axiom, where the elements are the named classes (the things in single quotes) and var names.

Some recommended changes for mondo patterns

  • adult => adult form of disease

use consistent vocabulary

e.g "adult form of disease" is OK as a name. "adult variant of disease" is not good if we use variant to mean a non-subclass variant

long names are not necessarily bad

we don't pay for characters, don't worry too much about length, within reason

use the term specific as appropriate (TBD)

consider a pattern name 'cancer by location'. This is ambiguous. Do we mean:

  • pattern with 2 vars: (1) cancer morphological type [genus] (2) location
  • patterm with 1 var: location [the genus is fixed at 'disease'

consider prefixing with "specific"; e.g. the first would be called "specific cancer by location"; alternatively "cancer subtype by location"

Perhaps we should even call the 2nd "cancer (general) by location" (TBD.. this is awkwrd)

avoid X in name

always use a meaningful name

use the same filename as pattern name

description should describe the pattern instances not the class instances

E.g.

mondo leiomyosarcoma

https://mondo.readthedocs.io/en/latest/editors-guide/patterns/leiomyosarcoma/

An uncommon, aggressive malignant smooth muscle neoplasm, usually occurring in post-menopausal women that is characterized by a proliferation of neoplastic spindle cells that is located in a specific anatomical location.

This is not a good pattern description, it describes to leiomyosarcomas, not leiomyosarcoma classes

Instead:

This pattern is for classes representing leiomyosarcomas differentiated by where they are found in the body. leiomyosarcomas are uncommon, aggressive malignant smooth muscle neoplasms

include motivation

E.g. leiomyosarcomas can occur in different sites in the body so we include this pattern to...

include examples

As well as auto-examples, include manually selected examples that highlight key aspects

TODO: we should have a specific field for listing this. These should then be used as unit tests

include minimal metadata

  • status
  • contributors
  • authors
  • links to tickets
  • date of creation

document rules

some patterns may be associated with rules: sparql, regexes, python, ... document these

be specific with range constraints

avoid owl:Thing

consider unions rather than going up the hierarchy if a specific class doesn't exist

challenges: for upper level terms we want to use cob but it is not yet ready

be careful with specifying things too specific and accidentally forcing some classes not to be matched. This is why examples / unt tests (see above) are vital

patterns should be disjoint

this is more of an aspiration at the moment

consider 2 patterns

  • cancer subtype by location; 2 vars: cancer_subtype, location
  • cancer (grouping) by location, 1 var: location

any class that conforms to the 2nd will also conform to the first. Ideally we could extend dosdp to be able to say: the range of this class is a proper subclass of cancer

TODO: we should have a specific field for listing this (examples). These should then be used as unit tests

I think this is now supported.

But they are a bit underspecified as strings - and the pattern library may not have access to the contextual axioms required to perform a test, for example if the examples are HP examples and the ontology is XPO (importing uPheno patterns). So XPO running a unit test checking an HPO class would not make sense. Better to say: could be used for unit tests.

I like all of this what you are proposing. One thing that is really hard for me though is get people to write better pattern descriptions. Can we come up with a grammar for pattern human readable descriptions making to the gd patterns for term definitions which people can copy paste and fill in? Your example "This pattern is for classes representing leiomyosarcomas differentiated by where they are found in the body. leiomyosarcomas are uncommon, aggressive malignant smooth muscle neoplasms" sounds like its following this pattern:

This pattern is for classes representing X {differentiated by Y}. X are {IAO:115 definition of X?}.

Seems a bit bare, as one of the key features of the definition should be to differentiate the pattern from similar patterns. So, I would like a sentence like:

This pattern does no cover X, see Y

For example, for the "increased rate" pattern:

This pattern does not cover the case where a process output is increased. Consider "increased efficacy" pattern instead.

Yes, I think negative examples are always good.

This pattern is for representing subclasses of X, where the discriminating factors are {Y, Z, ...}. An example is FOO, which is a X differentiated by .... This pattern would not be used for ...