gbif/pipelines

Reprocessing old event datasets causes registry state update issues

Closed this issue · 4 comments

Reprocessing an event dataset that was unchanged for 2 years hung in the "running ingestions".
On forcing it to finish in the ingestion history and retrying, the result appeared in the history as this screenshot.

Page 8 (attempt 24)
image

The suspicion is this was processed before the Event pipelines were deployed and there is some kind of mismatch in the messages being sent around.

We could fixup the code, or perhaps it is simpler to touch all the DwC-A to an earlier date and force crawl all the old event datasets to avoid this situation?

Fixed history data for all related datasets
Page 8, attempt 24
https://registry.gbif.org/dataset/c1e31227-6595-4797-b75a-d9d9f75e4cca/ingestion-history

NB mass-force-crawling is generally not a great solution, as datasets will have gone offline (temporarily or permanently).

@MattBlissett
Reinterpretation scripts use verbatim-to-interpreted information with additional steps. I suggest using the same approach. Starting from dwca-avro only makes sense if the schema for extended-record.avro has been modified