!Archived!

This project is archived. Originally it was part of the https://github.com/flokkr project, but it's no longer maintained.

Hadoop cluster on kubernetes

Example kubernetes definition for the flokkr docker images.

Topic	Solution
Configuration management
Source of config files:	Kubernetes configuration object.
Configuration preprocessing:	No.
Automatic restart on config change:	No.
Provisioning and scheduling
Multihost support	Yes.
Requirements on the hosts	Full kubernetes cluster.
Definition of the containers per host	Kubernetes resource definition.
Scheduling (find hosts with available resource)	Yes. But because the kubernetes limitations, only Stateful Sets are used.
Failover on host crash	Yes
Scale up/down:	Yes, with kubernetes.
Multi tenancy (multiple cluster)	Yes.
Network
Network between containers	Kubernetes network.
DNS	Yes, kubedns based.
Service discovery	Based on dns.
Data locality	N/A
Availability of the ports	Ingress definition is needed.

You need a kubernetes cluster. The easiest way is just using a provider (eg. GCE)

Upload the predefined configuration with:

kubectl apply -f config.yaml

kubectl apply -f hdfs.yaml