Dupes with extract-bibs

Question

Dupes with extract-bibs

paulusm opened this issue 10 years ago · 1 comments

The extracted bibtex files often seem to contain exact duplicate entries, which is causing me issues when trying to parse them.

Answer 1 · 2014-09-25T16:51:56.000Z

Yea, this is a pain but I don't think there is an easy fix. This can happen for two reasons:

the PDF parser incorrectly splits a single reference into two, which will both resolve to the same DOI,
or the web api incorrectly points two different references (say with similar authors) to the same DOI

In either case it would be tough to guarantee no duplicates. I usually use a bibtex manager like jabref or bibdesk to clean things up and remove duplicates before merging into the main bib file. I wouldn't trust this bib-extract to go straight to compiling without cleaning it up first.