petermr/CEVOpen

đź“• Documentation: Dictionary.xml and DictionaryDescription.md of: eoPlant

Opened this issue · 3 comments

Here we describe the process of creating a [DictionaryName]DictionaryDescription.md document, within which we will describe the contents of the individual dictionary (named in the title of this Issue), which was created (or is in the process of being created) from data collected for Oil186.

I will begin this thread by pasting the contents of the INDEX description, then follwed by first draft copy below for discussion and direction.

Plants

Layman and Botanical Names / Species

 

PlantOilDictionaryDescription.md

  • Description: A dictionary of 1678 constituent chemical compounds extracted from the Essential Oils of **[XX] plants **mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, ?????? had their names normalized and tagged with corresponding Wikidata IDs, the other 112 remain to be resolved.

  • Filename: PlantOil.xml

  • File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/plant/PlantOil.xml

Plant Essential Oils​​ Dictionary

 

Description

A dictionary of 1678 constituent chemical compounds extracted from the Essential Oils of **[XX] plants **mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, ?????? had their names normalized and tagged with corresponding Wikidata IDs, the other 112 remain to be resolved.

 

File Data

 

Table Column Headings

  • title: type of data to be normalized and tagged with Wikidata ID.

  • desc: data source

  • id: ID correlated to the row number of this table (not including first two column heading rows)

  • term: The term is the precise string used to identify the concept.

  • wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.

 

Contents/Results

  • No. of source papers: 186

  • No. of Entries (Headers are not counted): 1678

  • No. of unique plant names (including alternate spellings or synonyms): 1678 - DUPLICATES = ??

  • No. of Chemical Compounds resolved in Wikidata: ????

  • No. of Chemical Compounds NOT resolved in Wikidata: 112

 

Notes:

  • I propose we rename this Dictionary to “Plants”. (If we do, will we need to update links elsewhere in GitHub, or will they resolve themselves?)

  • I’m not sure how to deal with the duplicates… do we re-serialize the “id”?

  • Once the duplicates are removed, please update the figures in this document description as well as the results.

  • Duplicates (some of them more than once): 48

    • (See oil OilPlantCopy.xlsx for duplicates I’ve highlighted in yellow boxes.)
    1. Artemisia lobelii

    2. Artemisia roxburghiana

    3. Artemisia caerulescens

    4. Bunium persicum

    5. Cedrus libani

    6. Citrus limon

    7. Citrus paradisii

    8. Citrus sinensis

    9. Conyza canadensis

    10. Curcuma longa

    11. Cymbopogon citratus

    12. Cymbopogon nardus

    13. Hedychium gardnerianum

    14. Hyssopus officinalis

    15. Helichrysum stoechas

    16. Lavandula latifolia

    17. Melaleuca alternifolia

    18. Mentha piperita

    19. Myrtus communis

    20. Micromeria cristata

    21. Nepeta betonicifolia

    22. Neolitsea dealbata

    23. Ocimum basilicum

    24. Ocimum micranthum

    25. Pistacia lentiscus

    26. Pourouma cecropiifolia

    27. Psidium guajava

    28. Psidium guineense

    29. Rosmarinus officinalis

    30. Salvia aucheri

    31. Salvia euphratica

    32. Salvia moorcroftiana

    33. Salvia officinalis

    34. Salvia sclarea

    35. Sideritis raeseri

    36. Sphaerantia discolor

    37. Tanacetum cadmeum

    38. Tanacetum polycephalum

    39. Teucrium chamaedrys

    40. Teucrium montanum

    41. Valeriana officinalis

    42. Vetiveria zizanioides

As of today, I believe this dictionary and it's description document are complete. Below I will copy the contents of the description document:

EO Plant​​ Dictionary

 

File Data

 

Table Column Headings

  • id: serialized identification number

  • term: The term is the precise string used to identify the concept.

  • wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.

  • desc: data source

 

Contents/Results

  • No. of source papers: 186

  • No. of Entries (Headers are not counted): 1678

  • No. of unique plant names (including alternate spellings or synonyms): 1678

  • No. of entries resolved to wikidataIDs: 1567

 

 

Notes: