nghoanglongde/spark-cluster-with-docker

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Shell

Run Spark Cluster within Docker

This is the implementation of spark cluster on top of hadoop (1 masternode, 2 slaves node) using Docker

Follow this steps on Windows 10

1. clone github repo

# Step 1
https://github.com/nghoanglong/spark-cluster-with-docker.git

# Step 2
cd spark-cluster-with-docker

2. pull docker image

docker pull ghcr.io/nghoanglong/spark-cluster-with-docker/spark-cluster:1.0

3. start cluster

docker-compose up

4. access site

hadoop cluster: http://localhost:50070/
hadoop cluster - resource manager: http://localhost:8088/
spark cluster: https://localhost:8080/
jupyter notebook: https://localhost:8888/
spark history server: http://localhost:18080/
spark job monitoring: http://localhost:4040/