/hadoop_cluster

Running Hadoop in a docker swarm on multiple hosts

Primary LanguageDockerfile

Hadoop cluster

Made for an assignement, not fit for anything remotely serious.

Getting started

You need at least one Docker Swarm manager

# On manager
docker swarm init

This will print a command that you'll need to run on all other nodes.

Starting the cluster

We use Docker Deploy to start and manage the services.

docker stack deploy --compose-file=docker-compose.yml [name of the cluster]

Stopping a cluster

docker stack rm [name of the cluster]

Scaling a cluster

docker service scale [service name]=[number of desired replicas]

Useful stuff

Visualize cluster health

You can access a visualizer by using a brower to go to [IP]:8080 where IP is a node in the cluster (any node should work).

Query logs of a service (all container of one kind)

To get an aggregated log of all the containers of a specific service, run

docker service logs [service name]

Docker service names can be found with

docker service ls

Enter in a node to issue commands

Executing docker ps will list all containers running on one node.

docker exec -it [container name] bash

Known issues

  • Running the cluster on non linux hosts may cause issues with docker DNS (VIP)
  • Restarting the master causes all the nodes to fail.