internetarchive/fatcat

Duplicated DOIs from datacite import

Opened this issue · 0 comments

When importing content from Datacite bulk dump, we seem to have some duplicated adjacent (or near-adjacent?) rows, which resulted in the same DOI getting imported multiple times in the same editgroup. This resulted in at least 8000 duplicate DOIs.

Cleanup is to merge releases (redirecting one to the other). Presumably using common tooling with pubmed, container, and other cleanups.

cc: @miku