gbif/pipelines

Interpret new (July 2023) Darwin Core terms

Closed this issue · 3 comments

The recent Darwin Core release has introduced 8 new terms we will need to support:

  • parentMeasurementID will only be in the measurement extension
  • verbatimLabel doesn't need interpretation, but we should aim to preserve newline characters. That isn't possible with the tab-separated text files we use in our DWCA downloads.
  • caste may need interpretation with a vocabulary, but DWC doesn't specify one
  • vitality may need interpretation with a vocabulary. DWC is/was working on one, but hasn't released it
  • eventType was already being tested, see #702.
  • superfamily, tribe and subtribe don't need interpretation, at least until Checklistbank/the backbone support them.

For verbatimLabel, the best we can do in TSV export might be to escape newlines as \n. The same escape might already be found in JSON of dynamicProperties.

ILL: Union Co. Wolf Lake by Powder Plant Bridge. 1 March 1975 Coll. S. Ketzler, S. Herbert\nMonotoma longicollis 4 ♂ Det TC McElrath 2018\nINHS Insect Collection 456782\n

(From https://dwc.tdwg.org/examples/verbatimLabel).

For verbatimLabel, the best we can do in TSV export might be to escape newlines as \n.

I agree this is the best option. Anything else would be further from the original.

Note this affects IPT too as it puts out DwC-A of the structure:

<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
  <core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType=...

Hi all,

The new terms are now ready for use in the IPT, and will be interpreted by pipelines and shown on GBIF.org and in the API.