/boontorrent

A Real-Time Monitoring Tool for BitTorrent DHT Traffic

Primary LanguageHTML

BoonTorrent

A Real-Time Monitoring Tool for BitTorrent DHT Traffic

2018 Penn Senior Design Project

First Place in CIS Department

David CaoDylan MannAlex MosesGraham Mosley

Abstract

BitTorrent traffic is abundant, but difficult to analyze. To capture enough data for significant analysis, a large distributed solution is needed. Currently, research firms like Nielsen fail to properly account for illegal media consumption. Analyzing BitTorrent traffic would allow firms to study consumer behaviors that were previously invisible to traditional measures of media popularity.

Our solution is to deploy nodes that listen to the BitTorrent Mainline Distributed Hash Table (DHT). Each node runs a forked version of the excellent repo mldht by the8472. Our fork can be found here. Once our node is discovered by peers in the DHT, it begins routing queries, resolving torrents, and collecting metadata about the queries it receives. We process this data through our pipeline and store the processed results in Amazon S3 for easy access.

The main product of BoonTorrent is machine readable time-series data for research. We also implemented two proof of concept applications built on that data. The first is a heatmap visualization that is updated in real time with the last 2 minutes of traffic, and the second is a search engine for locating specific torrent files. In one month our search engine has indexed 1.2 million torrents representing 46 million files totaling nearly 4 petabytes in size. Our pipeline and both applications run for roughly $10 a day, and we are logging and analyzing roughly 7 million data points daily. Our work has shown that it is possible to cost effectively monitor BitTorrent traffic.

Project Structure

Location Description
indexer-lambda AWS Lambda that indexes resolved torrents, triggered by s3 object creation events.
torrent-summary-lambda AWS Lambda that retrieves a given torrent from S3 and decodes the metadata.
prototypes Prototype implementations.
spark-scala Local spark processing code.
userdata.sh Userdata script for EC2 instances.
docs screenshots and reference material.
app Proof of concept web applications written with ejs.

An example firehose log file can be found here.

Results

Over the period of a month, we crawled and indexed 1.2 million torrents describing 46 million files totaling 4PB.

If you're interested in the raw .torrent files or metadata, please file an issue.

Screenshots

World Map

Asia Map

Europe Map

Statistics

Search

Search Results

Individual Search Result