A Kubernetes deployment of a stand-alone Apache Spark cluster.
With this setup, the spark UI is accessible via kubectl
access - no new load balancers, no opening up any new external ports.
I created these files and howto doc to control the version of spark and to have control over the extra modules deployed on the workers.
Developed and tested on Azure ACS deployed via acs-engine. (WARNING: I've broken the gs://
support for now)
I presume you have your own working kubectl
environment.
- The k8s stuff is originally from the k8s spark example.
- The Docker images are at https://github.com/navicore/spark based on https://github.com/kubernetes/application-images/tree/master/spark.
- The Spark UI Proxy is https://github.com/aseigneurin/spark-ui-proxy.
$ kubectl create -f spark-master-controller.yaml
replicationcontroller "spark-master-controller" created
$ kubectl create -f spark-master-service.yaml
service "spark-master" created
$ kubectl create -f spark-worker-controller.yaml
replicationcontroller "spark-worker-controller" created
Done.
... Unless you want access to the UIs. In that case:
$ kubectl create -f spark-ui-proxy-controller.yaml
replicationcontroller "spark-ui-proxy-controller" created
$ kubectl create -f zeppelin-controller.yaml
replicationcontroller "zeppelin-controller" created
Done.
... Unless you also want to actually use the UIs. In that case:
-
for the kubernetes ui
kubectl proxy --port=8001
and follow link: kube ui
-
for the spark ui
kubectl port-forward spark-ui-proxy-controller-<POD-ID> 8080:80
and follow link: spark master ui
-
for the zeppelin ui
kubectl port-forward zeppelin-controller-sq7z5 8081:8080
and follow link: zeppeline ui
example from an sbt project with an assembly task
sbt assembly && kubectl exec -i spark-master-controller-<ID> -- /bin/bash -c 'cat > my.jar && /opt/spark/bin/spark-submit --deploy-mode client --master spark://spark-master:7077 --class my.Main ./my.jar' < target/scala-2.10/*.jar
don't look or think, just do kubectl create -f .
Use the gen_new_cluster.sh
script to create new standalone spark clusters
that can run safely within the same kubernetes subnet. The script changes the
name of the master used by the rest of the containers - no kube namespace used.
PREFIX="my-fav-cluster" ./gen_new_cluster.sEFIX="my" ./gen_new_cluster.sh
Then, cd build
, edit the files to adjust replica counts, ports, memory, etc..., and deploy kubectl create -f .
.