plazi/ggxml2taxpub

description of level-1

Opened this issue · 4 comments

@tcatapano do we have somewhere a description of level-1 following issue #17 . let me know so I can document this since more people get involved in this discussion.

We do not have a description of the minimal "level-1" encoding. However, briefly, it includes:

treatment-meta

  • mixed-citation, including
    -- treatment title tagged as named-content
    -- zenodo treatment DOI and treatmentbank uri tagged as uri
    -- parent publication article-title
    -- parent publication doi tagged as uri
    treatment
  • nomenclature
    • name
      treatment-sec (with uncontrolled sec-types taken verbatim from source ggxml)

at the phrase level inside treatment secs, taxon names and material-citation strings are encoded.

A formal expression of the de fact schema from the markup in the current 500 instance sample is:

default namespace = ""
namespace tp = "http://www.plazi.org/taxpub"

start =
  element tp:taxon-treatment {
    element tp:treatment-meta {
      element mixed-citation {
        element named-content {
          attribute content-type { xsd:NCName },
          text
        },
        (element article-title { text }
         | element uri {
             attribute content-type { xsd:NCName },
             xsd:anyURI
           })+
      }
    },
    element tp:nomenclature { taxon-name },
    treatment-sec*
  }
taxon-name = element tp:taxon-name { text }
treatment-sec =
  element tp:treatment-sec {
    attribute sec-type { xsd:NCName },
    (treatment-sec
     | element p {
         (text
          | taxon-name
          | element tp:material-citation { (text | taxon-name)+ })+
       })*
  }

@tcatapano should we add in the metadata a comment defining the level of taxpub, eg level=1

I think that would be a clever move.

@myrmoteras I agree it is a good idea to have an indication of the level of TaxPub markup in the instances metadata. See #32