/vascan-traditional-use

Enhancing a dataset of traditional use of medicinal plants of Canada

Primary LanguagePython

Traditional use of medicinal plants in Canada

Rationale

The paper:

Uprety Y, Asselin H, Dhakal A, Julien N. 2012. Traditional use of medicinal plants in the boreal forests of Canada: review and perspectives. J Ethnobiol Ethnomed. 2012;9:7. doi: 10.1186/1746-4269-8-7.

... contains a fantastic dataset in the supplementary data about the traditional medicinal use and vernacular names of plants in Canada. The paper (and thus supplementary data) are published under Creative Commons Attribution. The data however are provided as a Word file and thus not readily usable.

Challenge

During the 4-day course of the #BIH13 conference, we will attempt to transform the data to a usable CSV file and link the data up with the Database of Vascular Plants of Canada (VASCAN), in which @peterdesmet is involved.

Result

We managed to translate the dataset into a Darwin Core Archive, within the timeframe of the conference. See "Steps" below for the full details.

Steps

  1. Copy/paste the Word table to a CSV file.

  2. Get the data for one record (a taxon) on one line. [script]

  3. Fix some formatting (mostly manually). [script]

  4. Run the scientificName through the GBIF name parser and try to match the returned genus, specificEpithet, infraspecificEpithet and taxonRank with data from VASCAN. [script]

    Of the 545 names, 493 had one exact match, 48 no match, and 4 several matches. We tried to explain the mismatches here.

  5. Realize that there are too many vernacular name languages (14) and especially used plant parts (129) to express this in a flat file. Express as a Darwin Core Archive instead. [target format file]

  6. Express the scientificName, family and mapping to VASCAN in a Taxon Core. We also included the non-DwC term _habit.

  7. Express the vernacular names in a VernacularName extension. [script]. Languages are mapped to their ISO 639-3 code (the ISO 693-1 code as requested in dwc:language does not capture all languages). [mapping file] We also included the non-DwC term _languageName.

  8. Express the traditional medicinal use in a Description extension. Currently, the full description includes parts, uses and sources, but is not marked up as HTML. We also included the non-DwC term _plantPart, which are reconciled. [mapping file]

  9. Add a meta.xml file. [file]

  10. Catch up on sleep.