Time Waits for No One

We provide the processed datasets for a subset of our 8 tasks. All released datasets are intended for non-commercial use.

Twitter

We provide labels and tweet IDs and omit the content for the political affiliation task, in accordance with the Twitter License Agreement. These tweets were collected via [Twitter API for Academic Research] and is intended for non-commercial use.

For the twitter NER data, please see Shruti's Github. For reproducibility, we provide scripts for processing this data.

Media Frames Corpus

We provide scripts for processing the media frames corpus. Please see here.

Newsroom

For both newsroom summarization and publisher classification, we used the Newsroom dataset. We provide scripts for processing the data.

SciERC

We use the SciERC dataset. We also provide scripts for processing this data.

AI Publisher

We use data from the Semantic Scholar API, which is licensed under an ODC-BY. We release our data splits for this task.

Yelp.

Please see the yelp dataset.