/alexandria3k

Local relational access to openly-available publication data sets

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Alexandria3k

The alexandria3k package supplies a library and a command-line tool providing efficient relational query access to diverse publication open data sets. The most important one is the entire Crossref data set (157 GB compressed, 1 TB uncompressed). This contains publication metadata from about 134 million publications from all major international publishers with full citation data for 60 million of them. In addition, the Crossref data set can be linked with the ORCID summary data set (25 GB compressed, 435 GB uncompressed), containing about 78 million author records, as well as data sets of funder bodies, journal names, open access journals, and research organizations.

The alexandria3k package installation contains all elements required to run it. It does not require the installation, configuration, and maintenance of a third party relational or graph database. It can therefore be used out-of-the-box for performing reproducible publication research on the desktop.

Documentation

The complete reference and use documentation for alexandria3k can be found here.

Pre-print and citation

Details about the rationale, design, implementation, and use of this software can be found in the following paper.

Diomidis Spinellis. Open Reproducible Systematic Publication Research. arXiv:2301.13312, February 2023. https://doi.org/10.48550/arXiv.2301.13312