/arxiv-data-exploration

Exploratory data analysis and tooling on ArXiv (https://arxiv.org/) Metadata and Data

Primary LanguageJupyter NotebookMIT LicenseMIT

arxiv-data-exploration

Exploratory data analysis and tooling on ArXiv (https://arxiv.org/) Metadata and Data.

Dataset

ArXiv publicly exposes it's metadata and data (the actual PDF files) on it's scholarly articles for anyone to download and extract useful information as needed. You can figure what are the most cited articles and authors, what are the field defining articles, find trends in research, etc. with it.

Instead of directly using ArXiv public API, I have instead chosen to use Kaggle's compiled metadata snapshot on more than 2.5 million scholarly articles. The snapshot itself is updated every week, so, you should have updated information every week.

Explorations

Trend Exploration

Data: To start with any notebook, you must download the dataset available on Kaggle's website alongside the notebook.

Notebook: arxiv_data_exploration.ipynb