We provide the processed datasets for a subset of our 8 tasks. All released datasets are intended for non-commercial use.
We provide labels and tweet IDs and omit the content for the political affiliation task, in accordance with the Twitter License Agreement. These tweets were collected via [Twitter API for Academic Research] and is intended for non-commercial use.
For the twitter NER data, please see Shruti's Github. For reproducibility, we provide scripts for processing this data.
We provide scripts for processing the media frames corpus. Please see here.
For both newsroom summarization and publisher classification, we used the Newsroom dataset. We provide scripts for processing the data.
We use the SciERC dataset. We also provide scripts for processing this data.
We use data from the Semantic Scholar API, which is licensed under an ODC-BY. We release our data splits for this task.
Please see the yelp dataset.