gbif/pipelines

Taxanomic matching

Closed this issue · 7 comments

When a record is submitted via the IPT that contains a valid scientificNameID and a scientificName, the scientificNameID should be considered authoritative.

See https://discourse.gbif.org/t/millipedes-in-the-ocean/3991

The core of the problem here is that GBIF is using the ScientificName instead of the ScientificNameId (in this case it's Aphia ID). The latter should be definitive, and is correct on the MBA records. ScientificName should only be used if ScientificNameId is not present. It's true, that for some reason our ScientificName didn't match the ScientificNameId, but OBIS harvests these same records and gets the classifications right (I am a little surprised that EurOBIS, which has very stringent checking of taxonomy, had not rejected these records because the ScientificName hadn't matched the Aphia, but I can hardly blame them for our bad data!).

As for GBIF "fixing" the data, please don't. We're always ready to fix our own once we know there's an issue. Perhaps some data providers do ignore flags, but if this had been brought to our attention earlier, we'd have fixed it (and have done now, though I'm not sure how soon the data will be republished).

Hi,

Thanks for highlighting this Derek.
In EurOBIS we have an internal check (soon to be part of our public QC tool http://rshiny.lifewatch.be/BioCheck/) that compares the aphiaID under scientificNameID with the value under scientificName. So we do consider relevant using scientificName with the original identification together with the scientificNameID to do this crosscheck.

This issue has however given me an idea on how to improve that taxonomy check by adding also the higher classification to the check.

Thank you!

bart-v commented

FYI: GBIF not using the ScientificNameID is a known issue #217
And it's a shame: why are we using PIDs after all then...

ymgan commented

@bart-v I agree, please see a different concern when scientificNameID is not being interpreted #895
It could be confusing to the data user and our data provider got confused by why this is happening when they have done their best in providing data with utmost clarity.

I'll close this, linking to the original issue already capturing this #217

Please don't close issues as "completed", when they're not. This should have been "merged" into #217.

Sorry @derek-mba

GitHub doesn't have a merge option for issues, so I linked them and closed this only to try and keep the discussion together on the original issue. The alternative was to close this using the "won't fix" option.

I'll reopen this

With #217 closed with an implementation I'll also close this again as I don't think there is anything here that isn't covered in that thread, but please comment if I am mistaken.