This repository contains code to crawl and visualize the data from the ICLR 2024 OpenReview. Crawling is done via parallel requests
directly to OpenReview's API, which is way faster than selenium
- in the order of 10-100x
. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.
Run:
pip install -r requirements.txt
And run the notebooks under the notebooks/
folder:
0a. Parse data.ipynb
: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.0b. Crawl PDF.ipynb
: parse the PDF files of the papers to extract the main text.1. Plots.ipynb
: visualize the data using word clouds, bar charts, and other plots.2. Save Website.ipynb
: save the website as a static HTML file.
- Total submitted papers:
4874
papers - Average rating:
4.94
Feel free to open an issue or a pull request if you have any feedback or suggestions!
This repository is inspired by the following:
- Initial idea: https://github.com/evanzd/ICLR2021-OpenReviewData
- Previous year's repo: https://github.com/fedebotu/ICLR2022-OpenReviewData
- For web formatting and API requests: https://github.com/weigq/neurips2021_stats and https://github.com/weigq/iclr2022_stats