This is a sandbox for running a Hadoop YARN cluster on Kubernetes (using Minikube).
The sandbox can be started with a single command and will bring up a Hadoop YARN cluster with 2 datanodes, 1 namenode and 1 resource manager.
The various web interfaces for the cluster are proxied and exposed on the host machine automatically and can be accessed via the URLs listed below.
Warning
The sandbox is intended to be used for testing and development purposes only.
- Minikube
- GNU Make
- Docker
buildx
CLI plugin
- Minikube should have at least 8GB of memory and 4 CPUs for the sandbox to run properly (This can be changed in the
Makefile
).
- Run
make deploy
to deploy the system. - Run
make clean
to bring down the system (All data will be lost!)
- Run
make spark_exec
to exec into the spark pod. - The
work
directory is mounted as/work
in the spark pod. You can copy your spark job to this directory and run it usingspark-submit
. (Use--master yarn
to run the job on the YARN cluster.) - Or you can enter the spark shell using
spark-shell --master yarn
and run your spark jobs interactively.
- Run
make shell
to exec into thedfsadmin
pod. - You can run HDFS commands using
hdfs dfs
or run MapReduce jobs usingyarn jar
.
- Datanodes:
- Node Managers:
- Namenode: http://namenode-0.namenode.hadoop.svc.localho.st:9870
- Resource Manager: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8089
- Yarn UI 2: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8089/ui2/
- Yarn Timeline Server: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8188
- Mapreduce Job History Server: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:19888
- Spark History Server: http://spark.hadoop.svc.localho.st:18080
Hadoop Data Node | Hadoop Name Node |
YARN Node Manager | YARN Resource Manager |
Spark History Server | Spark UI |
This project is licensed under the MIT License - see the LICENSE file for details.