petermr/CEVOpen

đź“• Documentation: Documentation: Dictionary.xml and DictionaryDescription.md of: eoActivity

Opened this issue · 9 comments

Here we describe the process of creating a [DictionaryName]DictionaryDescription.md document, within which we will describe the contents of the individual dictionary (named in the title of this Issue), which was created (or is in the process of being created) from data collected for Oil186.

I will begin this thread by pasting the contents of the INDEX description, then follwed by first draft copy below for discussion and direction.

 EO Activities

ActivityDictionaryDescription.md

Activity​​ Dictionary

 

A dictionary of 184 activities mentioned in the 186 test articles downloaded from PubMed.

 

File Data

 

Table Column Headings

  • title: type of data to be normalized and tagged with Wikidata ID.

  • desc: data source

  • id: CM.activities.n where n is a serialized number

  • name: The name is a human readable string describing the concept.

  • term: The term is the precise string used to identify the concept. Name and Term are often the same.

  • wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.

  • wikipedia:

 

Contents/Results

  • No. of source papers: 186

  • No. of Entries (Headers are not counted): 184

  • No. of unique compound names (including alternate spellings or synonyms): 184

  • No. of Chemical Compounds resolved in Wikidata: 74

  • No. of Chemical Compounds NOT resolved in Wikidata: 110

 

Notes:

  • No source papers are listed. Should we assume 186, or delete that from Contents/Results?

  • We need to normalize the headings across all Dictionaries

    • This is the third case where the column heading “description” means something other than "data source / method of input"

    • Capitalization

  • In this case, is the column heading “id” related to Essoil? I don’t know how to describe it here. The format is: CM.activities.n where n is a serialized number

  • I don’t know how to describe the column headings for “Wikipedia” here

@petermr Currently working on cleaning the activities.xml dictionary.

Searching Wikidata for “antiacne” I found this entry:

https://www.wikidata.org/wiki/Q143139 "therapeutic subgroup of the Anatomical Therapeutic Chemical Classification System: Anti-acne preparations”

which led me to search and find this:

https://www.wikidata.org/wiki/Q192093 "classification of active ingredients of drugs according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties.”

and this: https://en.wikipedia.org/wiki/Anatomical_Therapeutic_Chemical_Classification_System

Questions:

  1. In the absence of a wikidata ID for "antiacne", should I...
    a) use no id at all
    b) use https://www.wikidata.org/wiki/Q143139
    c) use the ID for "acne" and let users put 2 and 2 together about the "anti-" part?

  2. should we be adding the Anatomical_Therapeutic_Chemical_Classification_System’s IDs to the activities dictionary as well as wikidata?
    https://www.whocc.no/atc_ddd_index/

Incidentally, the WHO Collaborating Centre for Drug Statistics Methodology
also has useful ways to express the following, which may be useful as dictionaries as well.

Units

g = gram
mg = milligram
mcg = microgram
U = unit
TU = thousand units
MU = million units
mmol = millimole
ml = milliliter (e.g. eyedrops)

Route of administration (Adm.R)

Implant = Implant
Inhal = Inhalation
Instill = Instillation
N = nasal
O = oral
P = parenteral
R = rectal
SL = sublingual/buccal/oromucosal
TD = transdermal
V = vaginal

Ok, I will add new entries as I go. If too time-consuming, I’ll swing back and do it after the dictionaries are cleaned, and then update them accordingly

Sent with GitHawk

I have just finished uploading the cleaned, disambiguated and Wikidata attributed activities dictionary, and updated it's description, as well as the master INDEX of descriptions.

ActivityDictionaryDescription.md

Hallelujah.

activity.xml and ActivityDictionaryDescription.md are now updated and working.

I have also updated master INDEXofOIL186Dictionaries.md

As of today, I believe this dictionary and it's description document are complete. Below I will copy the contents of the description document:

EO Activity​​ Dictionary

 

File Data

 

Table Column Headings

  • id: serialized identification number

  • term: The name is a human readable string describing the concept.

  • wikidataID: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.

  • description: short description of the activity sourced from wikidata and/or wikipedia

 

Contents/Results

  • No. of source papers: 186

  • No. of entries (Headers are not counted): 438

  • No. of unique activity names (including alternate spellings or synonyms): 438

  • No. of activities resolved in wikidata (including alternate spellings or synonyms): 340

  • Number of unique wikidata ids attributed to activities (normalizing for alternate spellings and synonyms): 250

  • No. of entries withoug wikidataid: 98

  • No. of entries with descriptions: 336

  • No. of entries without descriptions: 102

 

Notes:

  •