List of resources on scholarly data analysis ranging from datasets, papers, and code about bibliometrics, citation analysis, and other scholarly commons resources.
- Awesome Scholarly Data Analysis
- Table of Contents
- Datasets
- Tools
- Publication Venues
- Summer Schools
- Associations & Community
- Contributions
Table of contents generated with markdown-toc
- Arnet Miner
- Microsoft Academic Graph
- Open Academic Graph - MAG + AMiner
- Semantic Scholar Corpus
- CiteSeer
- PubMed
- CORA datasets
- CrossRef DOI URLs
- DBLP Citation dataset
- NBER Patent Citations
- Scopus Citation Database
- Papers, patents, and grants from Indiana University
- Small Network Data - Mark Newman's Lab
- The Koblenz Network Collection
- Google Scholar citation relations
- Open citations project
- Wikicite Project
- Ecnonomic Papers
- ArXiv data dump
- Complete ACL anthology as bibtex file
- ACL Anthology Reference Corpus
- Astrophysics data system (ADS) - All physics papers
- CORE 37M full text open access papers
- Inspire database for high energy physics articles
- Scholarly Data of workshops and conferences in RDF triplets
- The Collection of Computer Science Bibliographies
- OpenCitations corpus
- COCI Doi-Doi citation data
- DOAJ API (Directory of Open Access Journals)
- ROAD (Directory of Open Access Scholarly Resources)
- Sherpa/Romeo (Publisher copyright policies & self-archiving)
- OpenAPC (fees paid for open access journal articles)
- OSF API (Open Science Framework)
- Digital tools for researchers
- Mathematics Genealogy Project
- Academic Tree - Cross discipline academic genealogies
- MPACT project - Library Sciences
- PhDTree
- Chemistry Genealogy - curated at UIUC
- Notre Dame Genealogy Project
- UIUC Chemistry, Chemical Engineering, and Biochemistry
- Software Engineering Academic Genealogy
- Other lists of genealogy projects
- Wikipedia - Computer Science Genealogy
- Wikipedia - Theorecical Physicits Genealogy
- Wikipedia - Chemists Genealogy
- SCIENTIFIC GENEALOGY MASTER LIST - Scientists Associated with Concepts in Chemistry & Physics
- Economic Geneology Text Format
- Temporal profiles of PubMed authors
- ORCID data dump
- National Library of Medicine Profiles
- UIUC Professors database - Publications, Affiliations
- Author Profiles of scholarly authors in Wikipedia
- INSPIRE dataset
- Lee Giles dataset
- Cleaner version of Lee Giles dataset
- DBLP Korean Authors
- Arnet Miner
- DBLP Name disambiguation dataset
- rexa-coref-data
- Open Access Theses and Dissertations
- The Networked Digital Library of Theses and Dissertations (NDLTD)
- PhD Dissertations in the Area of Software Engineering
- ProQuest Dissertations & Theses Global
- Citation Parsing
- Document Summarization
- Keyphrase Extraction
- Related Work Summarization
- Biomedical NLP annotated datasets
- Chemical compound and drug name recognition task
- Semantic Scholar Dataset
- ScienceIE
- ACL RD TEC 2.0 also at @CLARIN
- SEPID Corpus - Segmended ACL ARC 1.0
- PubMed Central Open Access - BioC
- PubMed Fulltext - protein-protein and genetic interactions
- BioNLP - Argo
- Biomedical NLP - Stav
- GENIA - BioNLP 2011
- Genia Treebank used for SciSpacy training - SciSpacy link
- Full GENIA corpus
- Anatomical Entity Mention (AnEM) corpus
- CellFinder - Entity detection
- Multi-Level Event Extraction (MLEE)
- Biomedical sentence simplification
- PubMed - Colorado Richly Annotated Full-Text
- Biomedical NER datasets related publication
- BioVerbNet
- Lunar and Planetary Science abstracts for NER and Relations
- ACM data affiliations
- ACM - DBLP database entry matching
- Colorado Richly Annotated Full-Text - PubMed abstract annotated with entities mapped to 10 biomedical ontology terms.
- CLEF datasets for multilingual Biomedical NLP+IE
- MedMentions - UMLS entities in PubMed
- SciGraph Springer Nature
- Medical Subject Headings maintained by the National Library of Medicine of the United States
- Computer Science Ontology maintained by Scholarly Knowledge: Modeling, Mining and Sense Making
- Physics Subject Headings maintained by American Physical Society (APS)
- Open Biological and Biomedical Ontology (OBO) maintained by the OBO Foundry
- ACM Computing Classification System maintained by the Association for Computing Machinery
- Physics and Astronomy Classification Scheme (PACS) maintained by American Institute of Physics (AIP) discontinued in 2010 and replaced by Physics Subject Headings
- Mathematics Subject Classification (MSC) mantained by Mathematical Reviews and zbMATH
- Journal of Economic Literature (JEL) maintained by the American Economic Association
- STW Thesaurus for Economics maintained by ZBW - Leibniz Information Centre for Economics
- Australian and New Zealand Standard Research Classification (ANZSRC) maintained by Australian Bureau of Statistics, it consists of 3 sub-classification schemes:
- Fields of Research (FoR) classification
- Research Fields, Courses and Disciplines (RFCD) classification
- Socio-Economic Objective (SEO) classification
- Library of Congress Classification (LCC) maintained by Library of Congress
- Fields of Study (FoS) maintained by Microsoft Academic
- Google Scholar
- Semantic Scholar
- Microsoft Academic Graph
- AceMap
- GitXiv
- ACL Anthology
- NIPS papers
- Abel tools for PubMed data
- infolis: linking research data and publications
- Metrics toolkit
- Rcrossref (R library)
- Rscopus (R library)
- Scholar (R library)
- Bibliometrix (R library)
- CITAN (R library)
- BibeR (BibeR: A Web-based tool for bibliometric analysis in scientific literature)
- scihub.py (Python library)
- SoPaper (Python library)
- CiteSeer tools
- Novelty quantification in PubMed articles
- Biomedical - BioSentVec Embeddings
- Biomedical embeddings - CambridgeLTL
- NIH scientific paper pre-processing
- SciSpacy - Spacy models for Biomedical NLP from AllenAI
- Frontiers in Research Metrics and Analytics
- Scientometrics
- Journal of Informetrics
- Quantitative Science Studies (Open Access)
- Science, technology and human values
- Social Studies of Science
- Science and Public Policy
- Joint Conference on Digital Libraries (JCDL)
- International Conference on Theory and Practice of Digital Libraries (TPDL)
- European Semantic Web Conference (ESWC), Research of Research Track
- STI Conference series (Science and Technology indicators, e.g., 2018)
- ISSI Conference series (INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS, e.g., 2019)
- SIGMET - Metrics workshop
- International Workshop on Mining Scientific Publications
- Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-SD)
- Workshop on Reframing Research (RefResh)
- Enabling Open Semantic Science (SemSci)
The following people have contributed to the items on this list.
- Shubhanshu Mishra - Maintainer of the list.
- Angelo Antonio Salatino
- Philipp Zumstein
- Ali (Aliakbar Akbaritabar)