ideal output for reviewing LinkML enums?
turbomam opened this issue · 3 comments
turbomam commented
Especially enums that have meaning added with linkml_model_enrichment/annotators/enum_annotator.py
- Are there multiple enums with the same meaning
- then how to repair? make some of the enum names synonyms?
- Which enums have no meaning assigned?
- Which enum meaning are suspect because the name and the meaning-based description are lexically different? How much of a difference is noteworthy?
- A change in one letter or digit in a strain (organism) might indicate an entirely wrong meaning assignment
- But meaning can be assigned based on a synonym, in which case the name and description could be entirely different
- What string distance metric should we use? Cosine? SIFT4?
turbomam commented
python linkml_model_enrichment/annotators/enum_annotator.py \
--modelfile availabilities.yaml \
--tabular_outputfile mapping.log \
--ontoprefix NCBITaxon \
--enum_list strained_enum \
--replaced_chars Z
hopefully the enum names don't contain and Z
s! I wanted to be sure not to drop _
s or -
s
turbomam commented
poetry run python md2tsvs/md2tsvs.py \
--mdfile ../synbio-schema/handcrafted/generated/docs/binomial_name_enum.md \
--distcol0 0 --distcol1 1 \
--static_table_num 1
turbomam commented
Note that some strains aren't getting valid meaning assignments because NCBI hasn't recorded many strains for the organism, like https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=4952&lvl=3&lin=f&keep=1&srchmode=1&unlock