- Spark v.2.2.2
- Hive v.2.3.3
- Zeppelin v.0.8.0
- Hadoop 3.1.1
- Traefik 1.6.5
- Kubernetes(k8s) support
- docker-compose
- Docker Swarm(not tested long time)
You can find single node exampe without persistent volumes in kubernetes/hadoop/singlenode-k8s/without-pvc.
You can find single Node example in kubernetes/hadoop/singlenode-k8s/ceph-pvc with persistent storage based on CephFS.
You can find multi Node example in kubernetes/hadoop/multi-node-k8s/ceph-pvc with persistent storage based on CephFS.
Deploy k8s cluster with flannel pod network
If you have your own DNS server then add it to kubernetes/hadoop/dns/hadoop-dns-config.yaml file. Deploy external DNS servers to cluser.
kubectl create -f kubernetes/hadoop/dns/hadoop-dns-config.yaml
Setup proxy to access k8s hadoop components
cd kubernetes/proxy/conf
make create-traefik-conf
cd ..
kubectl create -f k8s-ingress-traefik.yaml
After installing cluster you can access to Apache Zeppelin with url /zeppelin on port 80 or 443. You can find spark dashboard on root url /. Also you can see traefik dashboard on port 8080.
You can find information about setup ceph cluster here
You can find information here
cd kubernetes/hadoop/stateful-set-ceph
kubectl create -f hadoop-env.yaml -f hadoop-kubernetes.yaml
cd kubernetes/hadoop/stateful-set-ceph
kubectl delete -f hadoop-env.yaml -f hadoop-kubernetes.yaml
Install docker-compose
# From repo root dir
docker-compose down && docker-compose up
docker node update --label-add disk=ssd <host-id>
docker node update --label-add disk.type=hive-metastore <host-id>
Обратите внимание на секцию constraints
hive-metastore-postgresql:
image: dmitryzagr/hive-metastore-postgresql:2.2.0-hadoop2.8.1-java8
hostname: hive-metastore-postgresql
volumes:
- hive-metastore-postgresql_data:/var/lib/postgresql/data
deploy:
placement:
constraints:
- node.role == worker
- node.labels.disk == ssd
- node.labels.disk.type == hive-metastore
Запуск hadoop кластера
git clone https://github.com/DmitryZagr/docker-spark-hive-zeppelin.git
cd docker-spark-hive-zeppelin
docker stack deploy -c docker-stack.yml hadoop
Запуск сервисов мониторинга
docker stack deploy -c docker-stack-monitor.yml monitor
Zeppelin Notebook находится на /zeppelin. Система при первом запуске сгенерит сертификат для обеспечения работы по HTTPS протоколу. Этот сертификат необходимо принять. Весь HTTP трафик будет перенаправлен на HTTPS порт. Во внешний мир можно выставить как 80, так и 443 порт.