umcu/dutch-medical-concepts

Check completeness of terminology

Closed this issue · 4 comments

Based on testing with the Mantra GSC dataset, a number of concepts is present in the Mantra annotations, but not in the the Dutch-umls terminology (current version: 1.8).

Three random ones that seem important:

C0042866 (vitamine D)
C0021641 (insuline)
C2827407 (middenoorontsteking)

Note that it is not the synonym that is missing, but the entire concept. It would be good to check the extent of this problem and think of any possible mitigations.

Full list of in Mantra but not Dutch-umls-1.8 (~190 items): excluded_annotations.txt

Updated list with dutch-umls-1.10-with-drug-names (~140 items):

onbekende_concepten_1.10_with_drugs.txt

Manually checked a sample of the missing concepts (116 unique terms). Most common reasons:

  • We excluded the concept from umls deliberately (e.g. TUI filtering on animals)
  • Annotators used a deprecated concept (e.g. C0341697, merged/replaced by C1565489 = renal insufficiency)
  • Annotators tagged a specific concept that is not present in dutch-umls, but a more general concept is present (e.g. annotated as C0037570 = dietary sodium, dutch umls only contains C0037473 = sodium
  • Terms that do not seem very interesting/relevant in the first place (e.g. bestanddeel, componenten, levensmiddelen)
  • Dutch umls seems to miss many ways to apply drugs, such as: tablet, injectie, filmomhulde tablet, etc
  • Some terms are just missing from Dutch umls: agenerase, antibiotica (the main problematic one remaining imo), cholestagel, etc. It's impossible to know the exact extent of missing concepts, but after diving into it a bit the problem seems rather small.

Full list of unique terms with my comments for some: missing_concepts_annotated.csv

My opinion on next steps would be to close this issue, and proceed when we are encountering actual missing problems in use cases. I think the problem is too small, and I also don't see any sensible and realistic mitigation.

@sandertan What do you think?

@vmenger Indeed, sounds good to pick up specific issues when they become a problem. Still I think it would be nice to at least add "antibiotica", that might be an important one. Let's add that one (https://uts.nlm.nih.gov/uts/umls/concept/C0003232) in the next big update.

Added a custom concept for antibiotica, closing this issue.