petermr/dictionary

Validation Steps that MUST be added

EmanuelFaria opened this issue · 0 comments

  • MUST detect and "handle" (compare/combine) plurals (eg. CHG/CHGs) when searching abbreviations
  • MUST normalize diacritics (eg. C&# 233; is in French)
  • MUST find >1 spaces with single spaces
  • MUST find and delete any spaces before or after quotes " in entry element
  • add metadata or attribute called [[[entry detection/selection method??]]] indicating which tool was used to generate each dictionary entry (spacy, RAKE, YAKE, etc.)
  • MUST detect exact duplicates in @term using everything between entry term=" " (see examples below). Currently, validator is marking "apium graveolens" (and others) as "duplicate terms not allowed apium graveolens"
    • entry term="apium graveolens co2 extract"
    • entry term="apium graveolens egypt"
    • entry term="apium graveolens essence"
    • entry term="apium graveolens extract"