/ugproject

Team Project on UG

Primary LanguageJavaScriptThe UnlicenseUnlicense

What is it?

This is an University of Gdansk Team Project created to explore the Reddit data available for research. The dataset consists of approximately 1.7 billion comments (250 GB compressed).

The purpose

The aim of this project is to execute some queries against the dataset using some pig scripts and user defined functions created in Scala for the mentioned earlier pig scripts. All (hopefully) would run at the end on Amazon Elastic Map Reduce (EMR)

Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. Contributors

Krzysztof Grajek
Maciej Rudnicki