Setup scripts and documentation to integrate Spark into the Cobald/Tardis system
- Clone this repository including the submodules
git clone --recursive https://github.com/stwunsch/cobald-tardis-spark
- Install the required software
The install.sh
script installs the required Python and Java software.
cd cobald-tardis-spark/
./install.sh
- Set the configuration
Have a look at the config.sh
file, set the correct configuration and run the configure.sh
script.
./configure.sh
-
Adapt the config in
hadoop-config/yarn-site.xml
and set the number foryarn.nodemanager.resource.cpu-vcores
to at least 2 and set the number foryarn.nodemanager.resource.memory-mb
to at least 2500. -
Go to the machine which should act as the master (aka resourcemanager in Yarn) and run
./run-resourcemanager.sh
- Go to the machine which should act as the worker (aka nodemanager in Yarn) and run
./run-nodemanager.sh
- Run the test script
./test-spark.sh