Akka example and homework code for the "Big Data Systems" lecture.
- Java Version >= 11
- Maven Compiler Version >= 3.8.1
- Clone repo
git clone https://github.com/UMR-Big-Data-Analytics/ddm-akka.git
- Decompress test data
cd ddm-akka/data
unzip TPCH.zip
- Build project with maven
cd ..
mvn package
- Read the program documentation
java -jar target/ddm-akka-1.0.jar
- First run
java -jar target/ddm-akka-1.0.jar master
- Distributed run (locally on one machine)
// Run a master
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar master -w 0
// Run a worker (repeat for multiple workers)
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar worker -w 1
-Xms
and -Xmx
are options for the Java Virtual Machine to configure initial and maximum heap size. To ensure that your program runs on the Pi cluster, make it no greater than two gigabytes (-Xmx=2048m
or -Xmx=2g
).
- Distributed run (on multiple machines)
// Run a master
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar master -w 0 -h <your-ip-address>
// Run a worker (repeat for multiple workers)
java -Xms2048m -Xmx2048m -jar target/ddm-akka-1.0.jar worker -w 1 -mh <master-host-ip> -h <your-ip-address>
Note that you need to substitute <your-ip-address>
and <master-host-ip>
with your and the master's IP address, respectively. You can use websites like whatismyipaddress.com or command-line utilities like hostname -I
and ifconfig
to get these IP addresses.
-Xms
and -Xmx
are options for the Java Virtual Machine to configure initial and maximum heap size. To ensure that your program runs on the Pi cluster, make it no greater than two gigabytes (-Xmx=2048m
or -Xmx=2g
).
- Run
java -jar target/ddm-akka-1.0.jar
without arguments to have a help text printed to your console. It will describe all parameters in detail. - Use
java -Xms2048m -Xmx2048m
to restrict your program run to two gigabyte of memory. This ensures that it runs on the Pi cluster. - Use
LargeMessageProxy
to process large messages. - For checking memory usage, see this thread.