Data Engineering Workspace

Detailed installation instructions (Docker Hadoop Cluster) can be found here.

Quick Start

Starting the cluster (to run in background use: docker-compose up -d)

docker-compose up

NOTE: Place your files (jupyter notebooks, etc.) under the workspace/ folder

To start only the client-node (Jupyter-Notebook) if the Hadoop-Cluster is not needed:

docker-compose up client-node

To start the cluster with an alternative configuration for limited main memory:

docker-compose -f docker-compose.yml -f docker-compose-small.yml up

WebUI-URL's:

Update to the Latest Release

First stop and delete the containers of the currently used version

docker-compose down

Then download the new tags from the remote server

git fetch --tags

Finally you can switch to the latest release (tag)

git checkout $(git tag | sort -V | tail -1)