Dockerized and preconfigured Zeppelin. This project is meant to be a sandbox to learn Zeppelin and Spark. The applications are operated with Docker, so that the student does not have to deal with the partly complex technical dependencies. The goal is to enable a quick start into the technology.
Apache Zeppelin is a Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. The Zeppelin Notebooks make strong use of Apache Spark, which is a fast and general engine for large-scale data processing. The Spark functions itself require for various tasks Apache Hadoop, which is a open-source software for reliable, scalable, distributed computing.
The Zeppelin Notebooks are placed in a separated repository and are integrated as a Git submodule.
Presentation slides can be found here
- Zeppelin version = 0.8.1
- Spark version = 2.4.3
- Hadoop version = 2.8.5
- Docker
- Docker Compose
- RAM = 5GB
- RAM = 2GB
git clone https://github.com/marhan/docker-zeppelin-tutorial.git
git clone https://github.com/marhan/zeppelin-notebook-samples.git docker-zeppelin-tutorial/zeppelin/notebook
docker-zeppelin-tutorial
docker-compose up -d
- Zeppelin
- Adminer (Postgres)
- PostgreSQL
- System: PostgreSQL
- Server: postgresdb
- Username: zeppelin_admin
- Password: zeppelin_admin
- Database: zeppelin
- MariaDB
- System: MySQL
- Server: mariadb
- Username: root
- Password: zeppelin
- Database: zeppelin
- PostgreSQL
- Webdav Web Server
- Username: zeppelin
- Password: zeppelin
- Minio Cloud Storage Server
- Access Key: zeppelin
- Secret Key: zeppelin
docker-compose rm -f -s -v zeppelin
docker rmi zeppelin
docker-compose up --force-recreate zeppelin
docker-compose stop
docker-compose rm -f -v
docker-compose exec zeppelin bash
docker ps # all
docker ps --format '{{.Names}}' # names only
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
In repository root folder you can execute the command below.
git submodule add git@github.com:marhan/zeppelin-notebook-samples.git zeppelin/notebook
This will create the file .gitmodule
with the entry below.
[submodule "zeppelin/notebook"]
path = zeppelin/notebook
url = git@github.com:marhan/zeppelin-notebook-samples.git