Data-intensive Replications: Scalable, Rapid, and Updating Approaches to Evaluate Research 

A Reading List

This repository attempts to assemble and categorize a body of emerging literature that leverages computational tools, big data sources, and machine learning to evaluate the replicability and reproducibility of research results across sciences. The focus is on novel works that develop scalable, rapid, and updating approaches to evaluation of research results. Although an evaluation system that is simultaneosuly scalable, rapid, and constantly updating is not (yet) in place, each literature source adds a piece from the puzzle.

Scalable: can simultaneously evaluate numerous published studies, hypotheses, results, and claims against purpose-build (e.g., prediction markets) or repurposed (e.g., high-throughput experiments) verification data.

Rapid: can efficiently screen research publications to promptly uncover false positive results, possibly before they propagate in the literature.

Updating: can incorporate next results in a continuously updating manner.

The distinction between readymade and custommade data Salganik made in Bit By Bit: Social Research in the Digital Age is useful here. We extend Salganik's distinction to tentatively categorize automated approaches to research replicability and reproducibility into four classes:

  • Reuse (e.g., aggregation of published results aka meta-analysis or scalable computational reproducibility)
  • Repurpose (e.g., use of Genome-wide association studies and high-throughout experiments to evaluate published results)
  • Crowdsource (e.g., use of prediction markets to aggregate individual beliefs)
  • Simulate (e.g., computer simulations)


Reuse (Reproducibility)




Initiatives & Funding

