This docker image(s) provides a Zeppelin server with supported Interpreters like spark
,python
,md
etc..
The repository yabhinav/zeppelin contains Dockerized Zeppelin Container images, published to the public Docker Hub via automated build mechanism. I use it to evaluate independently spark code in a more convenient way than a spark-shell
.
These are Docker images for are build on top latest Debian container.
This Docker container contains a full Hadoop distribution with the following components:
- Oracle JDK 8
- Zeppelin 0.7.3
- Spark 2.2.0
- Miniconda Datascience toolkit (R and python)
You can use the following tag to pull latest zeppelin server with all interpreters installed:
The image has 2 variations: minimal
, and all
.
-
minimal
: it includes the interpreters less than 50MB:angular,python,shell,bigquery,file,jdbc,kylin,livy,md,postgresql,cassandra,elasticsearch
-
all
: It includes all the interpreters, so beside the interpreters listed above, the following interpreters are also included:alluxio,ignite,lens,beam,hbase,pig,scio
Variants of the following tags available with zeppelin release versions:
For example for Zeppelin version - 0.7.3, You can pull the following images from release tags :
$ docker pull yabhinav/zeppelin:0.7.3-all
$ docker pull yabhinav/zeppelin:0.7.3-minimal
You can pull the latest image with all interpreters with command :
$ docker pull yabhinav/zeppelin:latest
All data are stored in /zeppelin
directory, such as:
ZEPPELIN_LOG_DIR
:/zeppelin/log
ZEPPELIN_PID_DIR
:/zeppelin/run
ZEPPELIN_NOTEBOOK_DIR
:/zeppelin/notebook
So, to persistent the data, a docker volume should be used to mount on the /zeppelin
directory.
$ docker volume create zeppelin-data
$ docker volume ls | grep zeppelin-data
$ docker run -d --name zeppelin -p 8080:8080 -p 4040:4040 -v zeppelin-data:/zeppelin yabhinav/zeppelin:latest
If you want to mount
/zeppelin
to host directory, instead of docker volume, please note, the directory's owner uid is501
, which is userzeppelin
inside the container.
$ docker run -d --name zeppelin -p 8080:8080 -p 4040:4040 -v ~/Downloads/zeppelin-data:/zeppelin yabhinav/zeppelin:latest
It's recommended to use docker-compose
for the service, an example docker-compose.yml
is provided for this purpose.
- From your project directory, type docker-compose up to start the Zeppelin container
$ docker-compose up
- If you want to run your services in the background, you can pass the -d flag (for “detached” mode) to docker-compose up and use docker-compose ps to see what is currently running:
$ docker-compose up -d
Creating network "dockerzeppelin_default" with the default driver
Creating volume "dockerzeppelin_zeppelin-data" with default driver
Creating dockerzeppelin_zeppelin_1 ... done
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------------------------------------------------
dockerzeppelin_zeppelin_1 /bin/sh -c ${ZEPPELIN_HOME ... Up 0.0.0.0:4040->4040/tcp, 0.0.0.0:8080->8080/tcp}
- If you started Compose with docker-compose up -d, stop your services once you’ve finished with them:
$ docker-compose stop
- You can bring everything down, removing the containers entirely, with the down command. Pass --volumes to also remove the data volume used by the zeppelin container:
$ docker-compose down --volumes
MIT / BSD
Created by Abhinav Yalamanchili