Dupes with extract-bibs
paulusm opened this issue · 1 comments
paulusm commented
The extracted bibtex files often seem to contain exact duplicate entries, which is causing me issues when trying to parse them.
jdherman commented
Yea, this is a pain but I don't think there is an easy fix. This can happen for two reasons:
- the PDF parser incorrectly splits a single reference into two, which will both resolve to the same DOI,
- or the web api incorrectly points two different references (say with similar authors) to the same DOI
In either case it would be tough to guarantee no duplicates. I usually use a bibtex manager like jabref or bibdesk to clean things up and remove duplicates before merging into the main bib file. I wouldn't trust this bib-extract to go straight to compiling without cleaning it up first.