đź“• Documentation: Dictionary.xml and DictionaryDescription.md of: eoPlant
Opened this issue · 3 comments
Here we describe the process of creating a [DictionaryName]DictionaryDescription.md document, within which we will describe the contents of the individual dictionary (named in the title of this Issue), which was created (or is in the process of being created) from data collected for Oil186.
I will begin this thread by pasting the contents of the INDEX description, then follwed by first draft copy below for discussion and direction.
Plants
Layman and Botanical Names / Species
PlantOilDictionaryDescription.md
-
Description: A dictionary of 1678 constituent chemical compounds extracted from the Essential Oils of **[XX] plants **mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, ?????? had their names normalized and tagged with corresponding Wikidata IDs, the other 112 remain to be resolved.
-
Filename: PlantOil.xml
-
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/plant/PlantOil.xml
Plant Essential Oils​​ Dictionary
Description
A dictionary of 1678 constituent chemical compounds extracted from the Essential Oils of **[XX] plants **mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, ?????? had their names normalized and tagged with corresponding Wikidata IDs, the other 112 remain to be resolved.
File Data
-
Filename: PlantOil.xml
-
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/plant/PlantOil.xml
Table Column Headings
-
title: type of data to be normalized and tagged with Wikidata ID.
-
desc: data source
-
id: ID correlated to the row number of this table (not including first two column heading rows)
-
term: The term is the precise string used to identify the concept.
-
wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
Contents/Results
-
No. of source papers: 186
-
No. of Entries (Headers are not counted): 1678
-
No. of unique plant names (including alternate spellings or synonyms): 1678 - DUPLICATES = ??
-
No. of Chemical Compounds resolved in Wikidata: ????
-
No. of Chemical Compounds NOT resolved in Wikidata: 112
Notes:
-
I propose we rename this Dictionary to “Plants”. (If we do, will we need to update links elsewhere in GitHub, or will they resolve themselves?)
-
I’m not sure how to deal with the duplicates… do we re-serialize the “id”?
-
Once the duplicates are removed, please update the figures in this document description as well as the results.
-
Duplicates (some of them more than once): 48
- (See oil OilPlantCopy.xlsx for duplicates I’ve highlighted in yellow boxes.)
-
Artemisia lobelii
-
Artemisia roxburghiana
-
Artemisia caerulescens
-
Bunium persicum
-
Cedrus libani
-
Citrus limon
-
Citrus paradisii
-
Citrus sinensis
-
Conyza canadensis
-
Curcuma longa
-
Cymbopogon citratus
-
Cymbopogon nardus
-
Hedychium gardnerianum
-
Hyssopus officinalis
-
Helichrysum stoechas
-
Lavandula latifolia
-
Melaleuca alternifolia
-
Mentha piperita
-
Myrtus communis
-
Micromeria cristata
-
Nepeta betonicifolia
-
Neolitsea dealbata
-
Ocimum basilicum
-
Ocimum micranthum
-
Pistacia lentiscus
-
Pourouma cecropiifolia
-
Psidium guajava
-
Psidium guineense
-
Rosmarinus officinalis
-
Salvia aucheri
-
Salvia euphratica
-
Salvia moorcroftiana
-
Salvia officinalis
-
Salvia sclarea
-
Sideritis raeseri
-
Sphaerantia discolor
-
Tanacetum cadmeum
-
Tanacetum polycephalum
-
Teucrium chamaedrys
-
Teucrium montanum
-
Valeriana officinalis
-
Vetiveria zizanioides
As of today, I believe this dictionary and it's description document are complete. Below I will copy the contents of the description document:
EO Plant​​ Dictionary
File Data
-
Description: A dictionary of 1678 plant names extracted mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, 1567 had their names normalized and tagged with corresponding Wikidata IDs.
-
Filename: eoPlant.xml
-
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlant/eoPlant.xml
Table Column Headings
-
id: serialized identification number
-
term: The term is the precise string used to identify the concept.
-
wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
-
desc: data source
Contents/Results
-
No. of source papers: 186
-
No. of Entries (Headers are not counted): 1678
-
No. of unique plant names (including alternate spellings or synonyms): 1678
-
No. of entries resolved to wikidataIDs: 1567