gbif/pipelines

`livingatlas` ES indexing is missing fields

Opened this issue · 1 comments

Pls assign to @djtfmartin

Having indexed the same occurrence datasets with both SOLR and ES, I'm seeing the following missing data/fields with ElasticSearch:

  • missing array type fields: assertions, geospatialIssues, speciesListUid, speciesGroups, speciesSubgroups
  • missing ALA custom lists-generated fields: countryConservation. raw_countryConservation, stateConservation, raw_stateConservation, etc.
  • missing value for license field (null value shown in ES index but SOLR version has a value)
  • taxonomy issues:
    • missing data: scientificNameAuthorship is in SOLR but no fields in ES with any authorship data. Expect to see it in gbifClassification section
    • GBIF has a gbifClassification:usage:formattedName field that is required by the React components - would be good to implement this in ALA pipeline (currently not provided for in SOLR)
    • vernacularName is missing
    • names_and_lsid is missing (do we care ? )
    • taxonRankID is missing
    • nameType is missing
  • DwC fields missing: provenance, geodeticDatum, contentTypes
  • ALA fields missing: decade, month, spatiallyValid, outlierLayerCount
  • type: EVENT appears to be hard-coded, as it should be OCCURRENCE
  • sampling data appears to be missing but may be a data issue on my part

I did try adding entries for a few of these missing fields to the ES schema file but it didn't make a difference.