INCATools/dead_simple_owl_design_patterns

New method request: dosdp normalise

Opened this issue · 6 comments

We need a method in dosdp which takes in a dosdp yaml file an returns it in a nicely canonical form:

We will use the ordering directly from the json schema, its perfect:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/src/schema/dosdp_schema.yaml#L299

Test data (bad examples):
https://github.com/monarch-initiative/mondo/tree/master/src/patterns/dosdp-patterns

Good example (mostly good):
https://github.com/obophenotype/upheno/blob/master/src/patterns/dosdp-patterns/abnormalAnatomicalEntity.yaml

I used this script back in the days:
https://github.com/monarch-initiative/mondo/blob/master/src/scripts/repair_mondo_patterns.py

@dosumis any chance we can have @hkir-dev work on that, if he wants?

Hi Nico, yes this is in my todo list. You can assign it to me.

I faced some issues while ordering comments. Ruamel.yaml is associating comments with the preceding element (https://stackoverflow.com/a/42173906). Based on the context, this behavior sometimes make sense, but sometimes cause confusion. For example:

relations:
  part_of: "BFO:0000050"
#  in_taxon: "RO:0002162"

# My comment on vars is ....
vars:
  gross_cell_type: "'cell'"

So in this case both comments belong to relations (actually for ruamel they all are a single comment together). If I change order of the relations or vars, "My comment on vars is ...." is sticking with the relations.

vars:
  gross_cell_type: "'cell'"

relations:
  part_of: "BFO:0000050"
#  in_taxon: "RO:0002162"

# My comment on vars is ....

Do you have any suggestions for the fix. Or, we can release the initial version as is (just warn user to check comments in the pattern file) and try to fix this issue in later versions.

Hmmm... interesting.. I am fine with a warning for now.. I don't think you can really do much, I don't think it is worth it to write a new parser.. I would suggest to add to the style guides that comments should be under the elements they pertain to.

Hmmm. I think that moving comments around risks making well annotated patterns confusing.

@matentzn - can you remind me of the rationale for this again. I don't see a massively strong case for improving diffs (please correct me if I'm wrong). If it's just about readability then doesn't the markdown conversion deal with that? (we could add cannonical ordering there).

I think having comments in a pattern like this is an antipattern all by itself, but that is another discussion.

Well, the question you are asking is the same as: "Is black useful in python?" Of course we can find an endless stream of arguments for an against it. I believe that canonical ordering, canonical quoting, canonical documentation practices makes patterns not only more readable which is a huge argument when managing hundreds of patterns like we do. We just want one way to format the patterns, else I am sitting there until kingdom cometh explaining to people how to use quotes properly, when to put "" around entities and when not, how to best represent three levels of quoting in template strings and so on. Its much easier for me to quickly look at a pattern and say: that is how it should be. I don't want to have 400 patterns in my library with every one looking totally different then the last, with different element orders, quoting and spacing.

I am definitely pro having a normalise method, and I would recommend we go ahead, but I am not dying on that hill. It feels like having a black for just about any data artefact we produce would be useful. But no, we it wont revolutionise the world of ontology engineering, so I guess you will still have to make a call.