The source code of BigSift is available at https://github.com/maligulzar/bigdebug/tree/bigsift-demo
Before building docker container, we first need to download following two files and place them under BigSift-Zeppelin
.
spark-2.1.1-SNAPSHOT-bin-2.2.0.tgz
available at BigSiftZeppelin binary
with all interpreters . Available at Zeppelin. Extract the file usingtar -xzf zeppelin.tar.gz
in the docker directory.
Now install Docker in your local machine (Follow instructions here). After the installation is complete, launch the 'Docker' application that will start the Docker service (e.g., Whale-like icon on your Mac status bar). If this step is successful, you should be able to type 'docker' on your command line console.
Assuming that you have installed Docker and currently in the BigSift-Zeppelin
directory, you should be able to see "DockerFile" under this directory. The following command creates a docker image using the DockerFile under the current directory ('.') and assigns "spark" as the name of the image.
docker build -t spark .
This step will take several minutes to build a docker container from the recipe. You should see the messages similar to the following on your screen. It will then pull the required packages, run each command, etc. This process will take long time, as it downloads Spark, Scala, and other tools required to do your subsequent assignments. You need to ensure that your machine has enough hard disk space (several GBs, mine is about ~2.17GB) and memory to finish this step.
bash-3.2$ docker build -t spark .
Sending build context to Docker daemon 1.954GB
Step 1/34 : FROM debian:jessie
---> 25fc9eb3417f
Step 2/34 : MAINTAINER Getty Images "https://github.com/gettyimages"
---> Using cache
---> 3106ccca439d
...
To list all the images along with their status. Run
docker ps -a
Once the docker image is built, you can start the cluster using docker-compose.
docker-compose up
Use this command only when launching the cluster for the first time. Afterwords, use docker-compose start
to start the cluster.
This command will initiate the cluster using the recipe docker-compose.yml
.
Starting dockerspark_master_1 ...
Starting dockerspark_master_1 ... done
Starting dockerspark_worker_1 ...
Starting dockerspark_worker_1
Starting dockerspark_zeppelin_1 ...
Starting dockerspark_zeppelin_1 ... done
Attaching to dockerspark_master_1, dockerspark_worker_1, dockerspark_zeppelin_1
zeppelin_1 | Zeppelin start [ OK ]
Give this step a few seconds to set up everything and start all the nodes.
Now the cluster has been setup. Go to port 6060 of your local machine localhost:6060 to access Zeppeling notebook.
Use the following command to attach to any container in the cluster.
dcoker exec -it <container-name > /bin/bash
where the name of containers are printed on the screen in Step 4 such as dockerspark_master_1
Use the following command to shutdown the cluster. Make sure you have transferred all the important data from the containers to the host machine. Otherwise the data lying on the containers will be lost
docker-compose stop
In case a spark job can not be submitted through the notebook (Spark Context not present exception), restart the cluster using docker-compose down
and then docker-compose up
.
The down
command will bring down the entire application and remove the containers, images, volumes, and networks entirely,