RefDeduR is an R package that supports accurate and high-throughput reference deduplication. It is especially useful for large datasets and operates on standard bibliographic information (i.e., it does not require information that cannot be retrieved from a mainstream search engine such as PMID).
The deduplication pipeline is modularized into finely-tuned text normalization, three-step exact matching, and two-step fuzzy matching processes. The package features a decision-tree algorithm and considers preprints and conference proceedings when they co-exist with a peer-reviewed version.
Jiaxian Shen
Department of Civil and Environmental Engineering, Northwestern University
You can install RefDeduR from GitHub with:
# install.packages("devtools")
devtools::install_github("jxshen311/RefDeduR")
-
For a step-by-step tutorial with an example dataset, see https://jxshen311.github.io/RefDeduR/articles/RefDeduR_tutorial.html.
-
For a complete introduction, check out the website: https://jxshen311.github.io/RefDeduR/.
-
For more information, check out the preprint on bioRxiv: https://www.biorxiv.org/content/10.1101/2022.09.29.510210v1.
If you use RefDeduR, please cite: https://www.biorxiv.org/content/10.1101/2022.09.29.510210v1
We thank Yutong Wu for the illuminating discussions about the design of RefDeduR. We are also grateful to Ruochen Jiao and Alexander G. McFarland for their help in coding.
We thank Ahmad Roaayala, Eko Purnomo, and Vectors Point from Noun Project for allowing us to use the following icons Research Paper, Report Paper, report, and Stats Report to create the logo.