
Repository of data on web domains.

Primary LanguagePython


A repository for aggregating web domain metrics, like partisanship or veracity classification, from peer reviewed publications. All data gathering and aggregating can be replicated by running bash replicate.sh. If you're looking for the final product see: data/domains.tsv

News is classifications are available in the news_is_news column, and are defined using:

  1. 488 domains identified as ‘hard news’ by Bakshy et al. (2015)
  2. 1,250 domains manually identified as news by Grinberg et al. (2019), and
  3. 6,288 domains aggregated from local news listings by Yin (2018)

Currently includes data from:

Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D. (2019). Fake news on Twitter during the 2016 US presidential election. Science, 363(6425), 374-378. Download data

Robertson, R. E., Jiang, S., Joseph, K., Friedland, L., Lazer, D., & Wilson, C. (2018). Auditing Partisan Audience Bias within Google Search. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 148. Download data

Leon Yin. (2018). yinleon/LocalNewsDataset: Initial release (V1.0). Zenodo. https://doi.org/10.5281/zenodo.1345145

Robertson et al. (2018) includes data from:

  • AllSides. 2018. Media Bias Ratings. AllSides. (2018). Download Data
  • Amy Mitchell, Jeffrey Gottfried, Jocelyn Kiley, and Katerina Eva Matsa. 2014. Political Polarization & Media Habits. Pew Research Center’s Journalism Project. (Oct. 2014). Download data
  • Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly 80, S1 (2016), 250–271. Download data
  • Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 6239 (2015), 1130–1132. Download data