/micro-cluster-lab

A micro cluster lab to experiment Dask and Spark (Python and Scala) based on Docker

Primary LanguageJupyter NotebookMIT LicenseMIT

Micro-Cluster Lab Using Docker, To Experiment With Spark & Dask on Yarn

For more details about this project please refer to my article where I explain the motivations and how to recreate it by yourself.

Project Folder Tree

├── docker-compose.yml
├── Dockerfile
├── confs
│   ├── config
│   ├── core-site.xml
│   ├── hdfs-site.xml
│   ├── mapred-site.xml
│   ├── requirements.req
│   ├── slaves
│   ├── spark-defaults.conf
│   └── yarn-site.xml
├── datasets
│   ├── alice_in_wonderland.txt
│   └── iris.csv
├── notebooks
│   ├── Bash-Interface.ipynb
│   ├── Dask-Yarn.ipynb
│   ├── Python-Spark.ipynb
│   └── Scala-Spark.ipynb
└── script_files
    └── bootstrap.sh

Create the base container image

docker build . -t cluster-base

Run the cluster or micro-lab

docker-compose up -d

Yarn resource manager UI

Access the Yarn resource manager UI using the following link : http://localhost:8088/cluster/nodes

yarn ui

Jupyter Notebook with starters notebooks

Access Jupyter Notebook using this link : http://localhost:8888/

jupyter

Stopping the micro-lab

docker-compose down