gbif/watchdog

Orphan datasets from Germany

Opened this issue · 7 comments

In Germany's list of orphans, all 15 datasets that are owned by Staatliche Naturwissenschaftliche Sammlungen Bayerns are false positives and should not be rescued. These are due to this bug in GBIF's crawling service.

Below is the latest analysis of the remaining orphan datasets conducted by @jholetschek. He is still awaiting replies form several hosts/curators/publishers to understand whether their data can come back online. Based on the results, GBIF will need to perform at least one dataset deletion plus change dataset endpoint URLs in GBIF Registry.

Germany orphaned.xlsx

Thanks @jholetschek for identifying that 8ea44a78-c6af-11e2-9b88-00145eb45e9a is back online.

Dataset https://www.gbif.org/dataset/85c8e444-f762-11e1-a439-00145eb45e9a is back online on a new BioCASe installation with 38.154 occurrences.

That's great news @jholetschek, thanks. That still leaves 43 candidate orphan datasets in Germnay that GBIFS hasn't been able to re-index in the last 6 months as you can see here https://github.com/gbif/watchdog/wiki/AdoptionPlan

This list still contains 5 datasets that are online.

Thanks @jholetschek

I updated the URLs for the 2 datasets from Friedrich-Alexander University of Erlangen-Nürnberg and triggered a re-crawl for them. I also triggered a re-crawl for the 3 datasets from Georg-August-Universität Göttingen. Hopefully they all finish crawling successfully this time.

Thanks a lot, Kyle! Seems all five datasets have been crawled successfully now.

Concerning https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a: I'll meet with the curator next week and will try to convince him to bring the dataset back online.

Dataset https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a is back online can can be crawled again.