/iga-adi-giraph

Distributed Isogeometric Alternating Directions Implicit Solver on Apache Giraph

Primary LanguageTypeScript

IGA-ADI Giraph Solver

Build Status

Prerequisites

You have to have JDK 11 installed to be able to compile the project. You might want to use SdkMan to manage this. You also have to have mvn 3.5.3 installed in your system. For processing the results you need Excel and Node 12.10.0. You might want to use nvm (node version manager) to help you.

How to run

This solver can be run in the cloud in the matter of minutes. The scripts in this repository are prepared for Google Cloud Platform (GCP), although it would work in the similar way in any cloud. In fact, it has been tested in Azure and AWS.

First, you have to create appropriate Hadoop cluster. In case of GCP it is called Dataproc.

Modify one of two scripts to match your needs:

  • bin/local/create.cluster.sh, good for running experiments

  • bin/local/create.singlenode.cluster.sh, good for testing the setup

The most important options there are:

  • --master-machine-type=n1-standard-4, which selects the node type for the master

  • --worker-machine-type=n1-standard-8, which selects the node type for the workers

  • --master-min-cpu-platform="Intel Skylake", which selects the minimum CPU platform for the master

  • --worker-min-cpu-platform="Intel Skylake", which selects the minimum CPU platform for the workers

  • --num-workers=4, which selects the number of workers

Once you modify the script according to your liking execute it and wait for the cluster to be created. Next, you can issue the command that will package the solver and then publish it along with all necessary scripts into the master node of your newly created cluster.

./bin/local/publish.cloud.sh <your master instance number>

where <your master instance number> by default is iga-adi-m

Then you need to connect to the instance.

./bin/local/connect.sh <your master instance number>

where <your master instance number> by default is iga-adi-m

Running the experiments

Once you’re connected you have multiple ways of running the experiments. You can run a series of experiments, one for each value of the modified parameter like in the following example:

IGA_PROBLEM_SIZE=3072 \
IGA_WORKERS=4 \
IGA_WORKER_MEMORY=8 \
IGA_WORKER_CORES=8 \
IGA_STEPS=2 \
RUNS=1 \
SUITE_NAME=my-experiment-name \
IGA_CONTAINER_JVM_OPTIONS="-XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -XX:+UseParallelGC -XX:+UseParallelOldGC" \
./run.suite.sh IGA_WORKERS 4 2 1

In here, all values passed as environment variables are static in this suite, and we change the number of the workers used by changing the variable IGA_WORKERS using first 4, then 2 and 1 at the end. You can use any variable in here provided you want to fix the number of workers.

You may want to change the value of SUITE_NAME to introduce some order into your result files as they will be catalogued based on this name.

You can also define your own test suites in and keep them in the repository. See bin/suites for the details. For instance, bin/cluster/suites/03-explicit-configs-to-run.sh provides a list of explicit configs to run in a sequence.

./suites/<your suite>.sh

Once the experiment is complete, make sure to retrieve your results to your local machine before you delete your cluster. Issue the following command from your local machine.

./bin/local/retrieve.cloud.sh <your master instance number>

where <your master instance number> by default is iga-adi-m

Processing the results

This repository contains a number of scripts which allow computing various statistics using the results generated in the experiments and visualising their properties.

Once you retrieve your results look into the logs directory. There will be a separate directory for each run in the suite of your experiments. It will start with the suite name, followed by the unique application identifier.

In order to process the results aggregate them in the structure similar to what is in results-sample directory, that is group by problem size.

Once you do this, you should be able to run

./results-external-extraction/extract-suite.sh <the directory of your suite>

where <the directory of your suite> is the base directory of the directories which correspond to your problem sizes. That will generate the CSV file to the console. Copy it over into the template Excel file located under results-external-extraction/scalability_template.xlsx. You might need to use a regular "text into columns" functionality to make it fill individual cells correctly. This will calculate speedup and other global metrics for you which are necessary for some visualisations. Save it in a separate file.

Most times you will also need to learn the internals of your experiments - what was happening in the cluster. For that, you need to run:

node build/main/index.js -i <the path to your simulations directory> -o <the path to the output excel file>

This should produce an Excel file with many rows and columns, each describing a particular superstep for all experiments.

Finally, using these two Excel files, you can regenerate images. Do this by executing:

./results-charts/regenerate-images.sh <SCALABILITY_XLSX> <SUPERSTEPS_XLSX>

This will take some time depending on the number of your experiments (sometimes even an hour). The images will be continuously generated under results-charts/out directory.