/yahoo-samoa-research

Research of Yahoo SAMOA

Primary LanguageJava

Distributed Stream Processing Using SAMOA Framework

Master of Science Final Project Spring 2014

Li Huang Computer Engineering and Computer Science Speed School University of Louisville April 28, 2014

l0huan08@gmail.com

==================== This is my final project for master degree in University of Louisville. My reasearch is mainly focus on trying Yahoo SAMOA, a distributed stream data mining platform. I composite 3 computers in our lab into a cluster, and setup SAMOA on this cluster.

My main work includes four parts: 1.Build the cluster and Setup SAMOA 2.Experiment 1: Test performance of SAMOA 3.Experiment 2: Implement my own data mining algorithm (Non-Parallel Naive Bayes) on SAMOA 4.Experiment 3: Implement parallel Naive Bayes algorithm on SAMOA

Currently, Experiment 3 are not finished, because of some bug in my code (actually it's because SAMOA lack of documents so I misunderstood the proper usage of some functions of SAMOA).

The reports could be found in "White Papers" and "Lab". The presentations could be also found in "PPT".