Step to run Project Components
Part-1: Running Clustering Algorithms
- Extract project2.jar from Project.zip to any location on your system.
- create data folder in same directory as project2.jar and add data files you want to run(ex cho.txt, iyer.txt)
- Give the following commands to run specific clustering algorithm
- KMeans java -cp project2.jar com.ub.cse601.project2.run.RunKMeans
- Hierarchical Clustering java -cp project2.jar com.ub.cse601.project2.run.RunHierarchialClustering
- DBScan java -cp project2.jar com.ub.cse601.project2.run.RunDBScan
Part-2: Running KMeans Map Reduce
- Follow steps 1 and 2 from part1.
- create two new folders input and centroids inside data folder created in above step.
- Give the following commands to run KMeans MR java -cp project2.jar com.ub.cse601.project2.run.RunKMeansMR
- Once MR Jobs are successful, Centroids folder contains inital centroid file used for MR(centroids_0.txt) as well as all intermediate centroid files used in each iteration of MR job(centroid_1.txt, centroid_2.txt, ....). The last centroid file generated is the final converged centroid file.
- Finally, output folder contains final output generated from MR jobs, which contains final centroids and their respective cluster data points assigned to.