
Create a Spark on Yarn Cluster

This is a wrapper coookbook over hadoop cookbook

This setup creates 3 vagrant boxes with 1 master and 2 slaves


To setup the cluster

  1. berks vendor cookbooks
  2. vagrant up --provison

To destroy the cluster

  1. vagrant destroy -y

To View Web Interfaces Hosted on this Cluster

Append the following in /etc/hosts file        local-spark-cluster-master.org.local local-spark-cluster-master        local-spark-cluster-slave01.org.local local-spark-cluster-slave01        local-spark-cluster-slave02.org.local local-spark-cluster-slave02
Machine Name of interface URI
Master YARN ResourceManager http://local-spark-cluster-master.org.local:8088/
Slave01 YARN NodeManager http://local-spark-cluster-slave01.org.local:8042/
Slave02 YARN NodeManager http://local-spark-cluster-slave02.org.local:8042/
Master Hadoop HDFS NameNode http://local-spark-cluster-master.org.local:50070/
Slave01 Hadoop HDFS DataNode http://local-spark-cluster-slave01.org.local:50075/
Slave02 Hadoop HDFS DataNode http://local-spark-cluster-slave02.org.local:50075/
Master Spark HistoryServer http://local-spark-cluster-master.org.local:18080/


As part of this setup follwing services are configured


  1. HDFS Namenode
  2. YARN Resourcemanager
  3. Spark History Server


  1. HDFS Datanode
  2. YARN Nodemanager

Submit Spark application

Login to master machine
vagrant ssh master\n

Login as hdfs user
sudo su - hdfs

Spark submit
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G /usr/hdp/