Apache Accumulo Spark Multinode Cluster with Docker.

Docker containers with prepeared environment to run Geotrellis jobs. As the result, there would be three containers (two slaves and one master) on a single machine in a distributed mode, so for heavy geotrellis tasks there should be enough ram.

Build Multinode HDFS + Accumulo + Spark Cluster

Build serf container
- cd accumulo-spark/serf
- docker build -t daunnc/serf:latest .
Build as-base container
- cd accumulo-spark/as-base
- docker build -t daunnc/as-base:latest .
Build as-master Master container (NameNode / DataNode / Resource Manager / NodeManager)
- cd accumulo-spark/as-master
- docker build -t daunnc/as-master-512m1:latest .

Sart the containers.

Run ./start-cluster.sh

Interaction example

Fix ./start-cluster.sh (for example to forward volume inside containers): docker run -d -t --dns 127.0.0.1 -v /localFolder:/dockerFolder ...
Get inside master container: docker exec -it master1 /bin/bash
Login as an hduser su - hduser to run jobs
Run job via spark-submit, using jars and scripts from the forwarded volume (/dockerFolder)

License

Based on a repository: https://github.com/alvinhenrick/hadoop-mutinode
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

algone/accumulo-spark

Apache Accumulo Spark Multinode Cluster with Docker.

Build Multinode HDFS + Accumulo + Spark Cluster

Interaction example

License