Crawl and Visualize ICLR 2024 OpenReview Data

Description

This repository contains code to crawl and visualize the data from the ICLR 2024 OpenReview. Crawling is done via parallel requests directly to OpenReview's API, which is way faster than selenium - in the order of 10-100x. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.

Usage

Run:

pip install -r requirements.txt

And run the notebooks under the notebooks/ folder:

0a. Parse data.ipynb: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
0b. Crawl PDF.ipynb: parse the PDF files of the papers to extract the main text.
1. Plots.ipynb: visualize the data using word clouds, bar charts, and other plots.
2. Save Website.ipynb: save the website as a static HTML file.