Upload data to Amazon AWS DynamoDB instance to measure the time taken. Then use Weka to create clusters of data.
###Steps:
- Created AWS DynamoDB instance on AWS account.
- Created a table on DynamoDB on the above instance and set PersonID as the primary key.
- Installed AWS SDK in Eclipse to access AWS services from Java code.
- Downloaded titanic survival data from
http://www.cs.toronto.edu/~delve/data/titanic/desc.html
- Uploaded data from
titanic.data
file to DynamoDB usingLoadDataToDynamo.java
. - Downloaded
Weka API (3.6.11)
fromhttp://sourceforge.net/projects/weka/
and configured Eclipse project to useweka.jar
. - To create clusters and visualize titanic data run
WekaRunner.java
. This class does following things:- Create Weka Instances after download data from DynamoDB using AWS API.
- These weka instances are then passed to a clusterer.
weka.clusterers.EM
class is used for clustering. - Using
VisualizePanel
class in weka, created a cluster visualization of the titanic data. - Timings: Total Time Taken [DynamoDB Scan]: 1718 ms Total Time Taken [WekaRunner]: 24356 ms