/hadoop-yarn-k8s

A sandbox for running a Hadoop-YARN cluster on Kubernetes

Primary LanguageMakefileMIT LicenseMIT

Hadoop-YARN-k8s Sandbox

This is a sandbox for running a Hadoop YARN cluster on Kubernetes (using Minikube).

The sandbox can be started with a single command and will bring up a Hadoop YARN cluster with 2 datanodes, 1 namenode and 1 resource manager.

The various web interfaces for the cluster are proxied and exposed on the host machine automatically and can be accessed via the URLs listed below.

Warning

The sandbox is intended to be used for testing and development purposes only.


Prerequisites

System Requirements

  • Minikube should have at least 8GB of memory and 4 CPUs for the sandbox to run properly (This can be changed in the Makefile).

Running

  • Run make deploy to deploy the system.
  • Run make clean to bring down the system (All data will be lost!)

Running spark jobs

  • Run make spark_exec to exec into the spark pod.
  • The work directory is mounted as /work in the spark pod. You can copy your spark job to this directory and run it using spark-submit. (Use --master yarn to run the job on the YARN cluster.)
  • Or you can enter the spark shell using spark-shell --master yarn and run your spark jobs interactively.

Managing the cluster or running MapReduce tasks

  • Run make shell to exec into the dfsadmin pod.
  • You can run HDFS commands using hdfs dfs or run MapReduce jobs using yarn jar.

Important URLs


Screenshots

DataNode NameNode
Hadoop Data Node Hadoop Name Node
NodeManager ResourceManager
YARN Node Manager YARN Resource Manager
Spark History Server Spark UI
Spark History Server Spark UI

License

This project is licensed under the MIT License - see the LICENSE file for details.