Distributed Stream Processing Using SAMOA Framework
Master of Science Final Project Spring 2014
Li Huang Computer Engineering and Computer Science Speed School University of Louisville April 28, 2014
==================== This is my final project for master degree in University of Louisville. My reasearch is mainly focus on trying Yahoo SAMOA, a distributed stream data mining platform. I composite 3 computers in our lab into a cluster, and setup SAMOA on this cluster.
My main work includes four parts: 1.Build the cluster and Setup SAMOA 2.Experiment 1: Test performance of SAMOA 3.Experiment 2: Implement my own data mining algorithm (Non-Parallel Naive Bayes) on SAMOA 4.Experiment 3: Implement parallel Naive Bayes algorithm on SAMOA
Currently, Experiment 3 are not finished, because of some bug in my code (actually it's because SAMOA lack of documents so I misunderstood the proper usage of some functions of SAMOA).
The reports could be found in "White Papers" and "Lab". The presentations could be also found in "PPT".