gbif/pipelines

State/Province dictionary review

Opened this issue · 1 comments

The file https://github.com/gbif/pipelines/blob/dev/livingatlas/pipelines/src/main/resources/stateProvinces.tsv contains some oddities:

  • New Zealand is present as a state or province under Nz
  • There are a number of what appear to be encoding problems, eg "Gharda√Øa" or "H√§rjedalen"

This causes the stateProvince parser to identify a region of NZ as a state when downloading species lists.

There's probably another lurking bug. The species list download maps country names onto the Country enum name(). For New Zealand that may be NEW_ZEALAND which may not work when detecting country conservation status.