- java - Installation Instruction
- git - Installation Instruction
- docker - Installation Instruction
- Clone this repo to your local
- Execute the script : RunSparkJobOnDocker.sh
This repository contains all the required files to create a n-node spark cluster and run a simple app on it. In this project, the script RunSparkJobOnDocker.sh does the following:
- Pull the Spark image from docker-hub. Tag : 2.2.1
- Build and Create a n-node cluster. Here I'm creating a 3-node cluster. This can be changed by specifying
docker-compose up -d --scale slave=$number_of_nodes
- Wait for 10 seconds so that Docker fully establishes the network connections.
- Run the job on the cluster. You can either pull this Source Code and build with Gradle or try something of your own.
- Optional - after successful completion of job, bring down the cluster by running
docker-compose down
.
Master - localhost:8080
History Server - localhost:18080
Executors - The port bindings can be found by running docker ps -a
. Eg:
Pavans-MacBook-Pro:create-and-run-spark-job pavanpkulkarni$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
30a51b5f5a77 pavanpkulkarni/spark_image:2.2.1 "/usr/bin/supervisor…" 12 seconds ago Up 16 seconds 4040/tcp, 6066/tcp, 7077/tcp, 8080/tcp, 18080/tcp, 0.0.0.0:32854->8081/tcp create-and-run-spark-job_slave_3
So, the executor can be accessed using localhost:32854
Check this repo for Docker image details
Run the script RemoveContainersAndImages.sh to remove all the containers and images.