Bespin

Bespin is a library that contains reference implementations of "big data" algorithms in MapReduce and Spark. This repo contains datasets used in Bepsin demos:

The file Shakespeare.txt contains the The Complete Works of William Shakespeare from Project Gutenberg.
The file p2p-Gnutella08-adj.txt contains a snapshot of the Gnutella peer-to-peer file sharing network from August 2002, where nodes represent hosts in the Gnutella network topology and edges represent connections between the Gnutella hosts. This dataset is available from the Stanford Network Analysis Project.
The tarball taxi-data.tar.gz contains a one-day slice NY taxi data, chopped into one file per minute. See analyses in Todd Schneider's blog post Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance.

lintool/bespin-data

Bespin