(Re-)generate tidy .bib
files from online sources
The gist of regenbib
is as follows.
Instead of manually maintaining a references.bib
file with a bunch of entries like this ...
@inproceedings{streamlet,
author = "Chan, Benjamin Y. and Shi, Elaine",
title = "Streamlet: Textbook Streamlined Blockchains",
booktitle = "{AFT}",
pages = "1--11",
publisher = "{ACM}",
year = "2020"
}
... you should maintain a references.yaml
file with corresponding entries like that:
entries:
- bibtexid: streamlet
dblpid: conf/aft/ChanS20
The tool regenbib
can then automatically (re-)generate the references.bib
from the references.yaml
in a consistent way by retrieving high-quality metadata information from the corresponding online source (in the example above: dblp's entry conf/aft/ChanS20).
The tool regenbib-import
helps to maintain the references.yaml
file. Using LaTeX's .aux
file, it determines entries that are cited but are currently missing from the references.yaml
file. It then helps the user determine an appropriate online reference through an interactive lookup right from the command line. In the lookup process, an old (possibly messy) references.bib
file can be used to obtain starting points for the search (eg, title/author in an old references.bib
entry can be used to lookup the paper on dblp).
See the usage example below for details.
If your LaTeX project already has a Python virtual environment, activate it. Otherwise, setup and activate a virtual environment like this:
$ python -m venv venv
$ echo "venv/" >> .gitignore
$ source venv/bin/activate
Then install regenbib
:
$ pip install git+https://github.com/joachimneu/regenbib.git
You should now have the commands regenbib
and regenbib-import
available to you.
Suppose we have an old references.bib
file with this entry (and suppose it does not have a corresponding entry in our references.yaml
file):
@misc{streamlet,
author = {Chan and Shi},
title = {Streamlet Textbook Streamlined Blockchains}
}
We can easily import a corresponding entry to our references.yaml
file with regenbib-import
:
$ regenbib-import --bib references.bib --aux _build/main.aux --yaml references.yaml
Importing entry: streamlet
-> Current entry: Entry('misc',
fields=[
('title', 'Streamlet Textbook Streamlined Blockchains')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Chan'), Person('Shi')])]))
-> Import method? [0=skip, 1=dblp-free-search, 2=arxiv-manual-id, 3=eprint-manual-id, 4=current-entry, 5=dblp-search-title, 6=dblp-search-authorstitle]: 6
-----> The search returned 2 matches:
-----> (1) Benjamin Y. Chan, Elaine Shi:
Streamlet: Textbook Streamlined Blockchains. AFT 2020
https://doi.org/10.1145/3419614.3423256 https://dblp.org/rec/conf/aft/ChanS20
-----> (2) Benjamin Y. Chan, Elaine Shi:
Streamlet: Textbook Streamlined Blockchains. IACR Cryptol. ePrint Arch. (2020) 2020
https://eprint.iacr.org/2020/088 https://dblp.org/rec/journals/iacr/ChanS20
-----> Intended publication? [0=abort]: 1
As you see, regenbib-import
uses the messy/incomplete information from the old references.bib
file to help us quickly determine the appropriate dblp entry. This adds the following entry to references.yaml
:
entries:
- bibtexid: streamlet
dblpid: conf/aft/ChanS20
We can then re-generate a tidy references.bib
file based on the references.yaml
file:
$ regenbib --yaml references.yaml --bib references.bib
DblpEntry(bibtexid='streamlet', dblpid='conf/aft/ChanS20')
Entry('inproceedings',
fields=[
('title', 'Streamlet: Textbook Streamlined Blockchains'),
('booktitle', '{AFT}'),
('pages', '1--11'),
('publisher', '{ACM}'),
('year', '2020')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Chan, Benjamin Y.'), Person('Shi, Elaine')])]))
$ cat references.bib
@inproceedings{streamlet,
author = "Chan, Benjamin Y. and Shi, Elaine",
title = "Streamlet: Textbook Streamlined Blockchains",
booktitle = "{AFT}",
pages = "1--11",
publisher = "{ACM}",
year = "2020"
}
See entry types in regenbib/store.py
:
- dblp
- arXiv
- IACR ePrint
- Raw
.bib
entry