ddm-akka

Akka example and homework code for the "Big Data Systems" lecture.

Requirements

Java Version >= 11
Maven Compiler Version >= 3.8.1

Getting started

Clone repo

git clone https://github.com/UMR-Big-Data-Analytics/ddm-akka.git

Decompress test data

cd ddm-akka/data
unzip TPCH.zip

Build project with maven

cd ..
mvn package

Read the program documentation

java -jar target/ddm-akka-1.0.jar

First run

java -jar target/ddm-akka-1.0.jar master

Distributed run (locally on one machine)

// Run a master
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar master -w 0
// Run a worker (repeat for multiple workers)
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar worker -w 1

-Xms and -Xmx are options for the Java Virtual Machine to configure initial and maximum heap size. To ensure that your program runs on the Pi cluster, make it no greater than two gigabytes (-Xmx=2048m or -Xmx=2g).

Distributed run (on multiple machines)

// Run a master
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar master -w 0 -h <your-ip-address>
// Run a worker (repeat for multiple workers)
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar worker -w 1 -mh <master-host-ip> -h <your-ip-address>

Note that you need to substitute <your-ip-address> and <master-host-ip> with your and the master's IP address, respectively. You can use websites like whatismyipaddress.com or command-line utilities like hostname -I and ifconfig to get these IP addresses.

Hints

Run java -jar target/ddm-akka-1.0.jar without arguments to have a help text printed to your console. It will describe all parameters in detail.
Use java -Xms2048m -Xmx2048m to restrict your program run to two gigabyte of memory. This ensures that it runs on the Pi cluster.
Use LargeMessageProxy to process large messages.
For checking memory usage, see this thread.