PeerRead

Data and code for "A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications" by Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy and Roy Schwartz, NAACL 2018

The PeerRead dataset

PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.

We structured the dataset into sections each corresponding to a venue or an arxiv category, e.g., ./data/acl_2017 and ./data/arxiv.cs.cl_2007-2017. Each section is further split into the train/dev/test splits (same splits used in the paper). Due to licensing constraints, we provide instructions for downloading the data for some sections instead of including it in this repository, e.g., ./data/nips_2013-2017/README.md.

Models

In order to experiment with (and hopefully improve) our models for aspect prediction and for predicting whether a paper will be accepted, see ./code/README.md.

Setup Configuration

Run ./setup.sh at the root of this repository to install dependencies and download some of the larger data files not included in this repo.

Citation

@inproceedings{kang18naacl,
  title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications},
  author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz},
  booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  address = {New Orleans, USA},
  month = {June},
  url = {https://arxiv.org/abs/1804.09635},
  year = {2018}
}

Acknowledgement