/reffix

A tool for fixing a BibTeX reference list using DBLP API

Primary LanguagePythonMIT LicenseMIT

reffix: Fixing BibTeX reference list with DBLP API 🔧

reffix GitHub GitHub issues PyPI PyPI downloads Github stars

➡️ Reffix is a simple tool for improving the BibTeX list of references in your paper. It can fix several common errors such as incorrect capitalization, missing URLs, or using arXiv pre-prints instead of published version.

➡️ Reffix queries the DBLP API, so it does not require any local database of papers.

➡️ Reffix uses a conservative approach to keep your bibliography valid.

➡️ The tool is developed with NLP papers in mind, but it can be used on any BibTeX list of references containing computer science papers present on DBLP.

Quickstart

👉️ You can now install reffix from PyPI:

pip install -U reffix
reffix [BIB_FILE]

See the Installation and Usage section below for more details.

Example

Before the update (Google Scholar):

  • ❎ arXiv version
  • ❎ no URL
  • ❎ capitalization lost
 {  
    'ENTRYTYPE': 'article',
    'ID': 'duvsek2020evaluating',
    'author': 'Du{\\v{s}}ek, Ond{\\v{r}}ej and Kasner, Zden{\\v{e}}k',
    'journal': 'arXiv preprint arXiv:2011.10819',
    'title': 'Evaluating semantic accuracy of data-to-text generation with '
             'natural language inference',
    'year': '2020'
}

After the update (DBLP + preserving capitalization):

  • ✔️ ACL version
  • ✔️ URL included
  • ✔️ capitalization preserved
 {   
    'ENTRYTYPE': 'inproceedings',
    'ID': 'duvsek2020evaluating',
    'author': 'Ondrej Dusek and\nZdenek Kasner',
    'bibsource': 'dblp computer science bibliography, https://dblp.org',
    'biburl': 'https://dblp.org/rec/conf/inlg/DusekK20.bib',
    'booktitle': 'Proceedings of the 13th International Conference on Natural '
                 'Language\n'
                 'Generation, {INLG} 2020, Dublin, Ireland, December 15-18, '
                 '2020',
    'editor': 'Brian Davis and\n'
              'Yvette Graham and\n'
              'John D. Kelleher and\n'
              'Yaji Sripada',
    'pages': '131--137',
    'publisher': 'Association for Computational Linguistics',
    'timestamp': 'Mon, 03 Jan 2022 00:00:00 +0100',
    'title': '{Evaluating} {Semantic} {Accuracy} of {Data-to-Text} '
             '{Generation} with {Natural} {Language} {Inference}',
    'url': 'https://aclanthology.org/2020.inlg-1.19/',
    'year': '2020'
}

Main features

  • Completing referencesreffix queries the DBLP API with the paper title and the first author's name to find a complete reference for each entry in the BibTeX file.
  • Replacing arXiv preprintsreffix can try to replace arXiv pre-prints with the version published at a conference or in a journal whenever possible.
  • Preserving titlecase – in order to preserve correct casing, reffix wraps individual uppercased words in the paper title in curly brackets.
  • Conservative approach:
    • the original .bib file is preserved
    • no references are deleted
    • papers are updated only if the title and at least one of the authors match
    • the version of the paper corresponding to the original entry should be selected first
  • Interactive mode – you can confirm every change manually.

The package uses bibtexparser for parsing the BibTex files, DBLP API for updating the references, and the titlecase package for optional extra titlecasing.

Installation

You can install reffix from PyPI:

pip install reffix

For development, you can install the package in the editable mode:

pip install -e .[dev]

Usage

Run the script with the .bib file as the first argument:

reffix [IN_BIB_FILE]

By default, the program will run in batch mode, save the outputs in the file with an extra ".fixed" suffix, and keep the arXiv versions.

The following command will run reffix in interactive mode, save the outputs to a custom file, and replace arXiv versions:

reffix [IN_BIB_FILE] -o [OUT_BIB_FILE] -i -a

Flags

short long description
-o --out Output filename. If not specified, the default filename <original_name>.fixed.bib is used.
-i --interact Interactive mode. Every replacement of an entry with DBLP result has to be confirmed manually.
-a --replace-arxiv Replace arXiv versions. If a non-arXiv version (e.g. published at a conference or in a journal) is found at DBLP, it is preferred to the arXiv version.
-t --force-titlecase Force titlecase for all entries. The titlecase package is used to fix casing of titles which are not titlecased. (Note that the capitalizaton rules used by the package may be a bit different.)
-s --sort-by Multiple sort conditions compatible with bibtexparser.BibTexWriter applied in the provided order. Example: -s ENTRYTYPE year sorts the list by the entry type as its primary key and year as its secondary key. ID can be used to refer to the Bibtex key. The default None value keeps the original order of Bib entries.
--no-publisher Suppress publishers in conference papers and journals (still kept for books).
--process-conf-loc Parse conference dates and locations, remove from proceedings names, store locations under address.
--no-formatting Disable automatic BibTeX formatting.

Notes

For lowering the amount of requests to the DBLP API, you can use the bibexport tool for generating a file compact.bib containing only the references used in the paper. As an input, use the file <myarticle>.aux created during compilation.

bibexport -o compact.bib <myarticle>.aux

Although reffix uses a conservative approach, it provides no guarantees that the output references are actually correct.

If you want to make sure that reffix does not introduce any unwanted changes, please use the interactive mode (flag -i).

The tool depends on DBLP API which may change any time in the future. I will try to update the script if necessary, but it may still occasionally break. I welcome any pull requests with improvements.

Please be considerate regarding the DBLP API and do not generate high traffic for their servers :-)

Contact

For any questions or suggestions, send an e-mail to kasner@ufal.mff.cuni.cz.