pcubillos/bibmanager

Bug: Regression from #90 with False duplicates detected for Incollection book entries

Closed this issue · 3 comments

Issue #89 resolved by PR #90 dealt with false duplicate detections, which was merged into version 1.3.4. I am running version 1.4.5 installed via conda-forge and I am running into the same issue.

Here is a false double detection. The ISBNs are the same, but the DOIs are different.

What's expected here is that bibmanager does not complain during a merge with these two entries, or other similar entries.

Note: This is possibly a DOI parsing issue as these DOIs have some Latex escape sequences in them.

@incollection{OConnor2017,
    author = "O’Connor, Evan",
    editor = "Alsabti, A. W. and Murdin, P",
    title = "{The Core-Collapse Supernova-Black Hole Connection}",
    year = "2016",
    booktitle = "Handbook of Supernovae",
    pages = "1--18",
    publisher = "Springer",
    url = "https://doi.org/10.1007/978-3-319-20794-0\_129-1 http://link.springer.com/10.1007/978-3-319-20794-0\_129-1",
    address = "Cham",
    isbn = "9783319218465",
    doi = "10.1007/978-3-319-20794-0{\\_}129-1"
}

NEW:
@incollection{Alsabti2016,
    title = {{Supernovae and Supernova Remnants: The Big Picture in Low Resolution}},
    year = {2017},
    booktitle = {Handbook of Supernovae},
    author = {Alsabti, Athem W. and Murdin, Paul},
    editor = {Alsabti, A.~W. and Murdin, P},
    pages = {3--28},
    publisher = {Springer, Cham},
    isbn = {9783319218465},
    doi = {10.1007/978-3-319-21846-5{\_}1},
    keywords = {Physics}

Ok I found where it happens, this is because duplicate ISBN from books are not filtered for a bibm merge

filter_field(bibs, new, "isbn", take)

Calls filter_field which does not have the same ISBN duplicate separate DOI check as in the remove_duplicates function from the same module, with the logic at:

# If field is isbn, check doi to differentiate chapters from same book:
if field == 'isbn':
dois = [
bibs[idx].doi if bibs[idx].doi is not None else ""
for idx in indices]
u_doi, doi_inv, doi_counts = np.unique(
dois, return_inverse=True, return_counts=True)
single_dois = u_doi[doi_counts==1]
indices = [
idx for idx,doi in zip(indices,dois)
if doi not in single_dois]
nbibs = len(indices)
if nbibs <= 1:
continue

I suggest separating the logic for checking ISBN duplicates that don't share a DOI (as in a book), and then calling that in filter_field if the field argument is 'isbn'.

and the filter_field function to modify:

def filter_field(bibs, new, field, take):
"""
Filter duplicate entries by field between new and bibs.
This routine modifies new removing the duplicates, and may modify
bibs (depending on take argument).
Parameters
----------
bibs: List of Bib() objects
Database entries.
new: List of Bib() objects
New entries to add.
field: String
Field to use for filtering.
take: String
Decision-making protocol to resolve conflicts when there are
duplicated entries:
'old': Take the database entry over new.
'new': Take the new entry over the database.
'ask': Ask user to decide (interactively).
"""
fields = [getattr(bib,field) for bib in bibs]
removes = []
for i,bib in enumerate(new):
if getattr(bib,field) is None or getattr(bib,field) not in fields:
continue
idx = fields.index(getattr(bib,field))
# Replace if duplicated and new has newer bibcode:
if bib.published() > bibs[idx].published() or take == 'new':
bibs[idx].update_content(bib)
# Look for different-key conflict:
if bib.key != bibs[idx].key and take == "ask":
display_bibs(["DATABASE:\n", "NEW:\n"], [bibs[idx], bib])
s = u.req_input(
f"Duplicate {field} field but different keys, []keep "
"database or take [n]ew: ",
options=["", "n"])
if s == "n":
bibs[idx].update_content(bib)
removes.append(i)
for idx in reversed(sorted(removes)):
new.pop(idx)

Probably also should add a bibm merge to the tests that contains a duplicate ISBN with separate DOIs :)

Hi Emir,
thanks a lot for the detailed report! Version 1.4.6 should fix this issue (pip is updated, conda should take a few hours to get up to date). Please take a look when you have time and let me know whether things are running ok.