/datastore

Primary LanguageJupyter NotebookMIT LicenseMIT

Data-Stack composed by Elastic Stack, Grafana, Jupyter, Spark and DB-adapter to stream data from a kafka broker

Based on the following Components:

The designated way to feed data into the DataStack is from the Apache Kafka message bus via the separate Kafka Adapter which is based on the components:

Contents

  1. Requirements
  2. Getting started
  3. Trouble-shooting

Requirements

  1. Install Docker version 1.10.0+
  2. Install Docker Compose version 1.6.0+
  3. Clone this repository
  4. Configured Docker Swarm

Getting Started

This repository is divided into a swarm path and compose path, where the compose path serves as a staging environment.

Local deployment

Start the Data-Stack in a local testing environment. Make sure the bash scripts are executable and you have write priviledges.

git clone https://github.com/i-maintenance/datastore/
cd datastore
./start_datastore_local.sh

To watch the output of the datastore:

./show_datastore_local.sh

To stop the container use this command:

./stop_datastore_local.sh

Deploy in a docker swarm

This section requires a running docker swarm. If not already done, check out this video tutorial to set up a docker swarm cluster.

Start the Data-Stack using docker stack on a manager node:

If not already done, start a registry instance to make the cumstomized jupyter-image deployable: (we are using port 5001, as logstash's default port is 5000)

sudo docker service create --name registry --publish published=5001,target=5000 registry:2
curl 127.0.0.1:5001/v2/

This should output {}:

Now, deploy the Datastore in swarm. Make sure the bash scripts are executable, you have write priviledges and the mounted volumes in the compose-file swarm/DataStack/swarm_docker-compose.yml exist!

git clone https://github.com/i-maintenance/datastore/
cd datastore
./start_datastore_swarm.sh

To watch the status of the datastore: It may take several minutes for the services to be ready!

./show_datastore_swarm.sh

To stop the container use this command:

./stop_datastore_swarm.sh

Services

Give Kibana a minute to initialize, then access the Kibana web UI by hitting http://localhost:5601 with a web browser. The indexing of elasticsearch could last 15 minutes or more, so we have to be patient. On Kibana UI, DevTools we can trace the indexing success by hitting the REST request GET _cat/indices.

By default, the stack exposes the following ports:

  • 5000: Logstash TCP input
  • 9200: Elasticsearch HTTP
  • 9600: Logstash HTTP
  • 5601: Kibana: User Interface for data in Elasticsearch
  • 3030: Kafka-DataStack Adapter HTTP: This one requires the db-adapter
  • 8080: Swarm Visalizer: Watch all services on the swarm
  • 8888: Jupyter GUI: Run Python and R notebooks with Spark support on elastic data, the default password for Jupyter is datastore. Please change that using the config file swarm/DataStack/jupyter/jupyter_notebook_config.py

Tracing

Watch if everything worked fine with:

sudo docker service ls
sudo docker stack ps db-adapter
sudo docker service logs db-adapter_kafka -f

Data Feeding

In order to feed the Data-Stack with data, we can use the Kafka-DataStack Adapter.

The Kafka-Adapter automatically fetches data from the kafka message bus on topic SensorData. The selected topics can be specified in .env file of the Kafka-DataStack Adapter

To test the Data-Stack itself (without the kafka adapter), inject example log entries via TCP by:

$ nc hostname 5000 < /path/to/logfile.log

Proxy Config

If used behind an apache2 proxy, make sure to enable additional moduls

sudo a2enmod ssl rewrite proxy proxy_http proxy_wstunnel

Use the following config (note that the notebook will be available in https:/url/jupyter)

RewriteRule ^/jupyter$ jupyter/tree/ [R]
                RewriteRule ^/jupyter/$ jupyter/tree/ [R]

                <Location "/jupyter">
                    ProxyPass        http://localhost:8888/jupyter
                    ProxyPassReverse http://localhost:8888/jupyter
                </Location>

                <Location "/jupyter/api/kernels">
                    ProxyPass        ws://localhost:8888/jupyter/api/kernels
                    ProxyPassReverse ws://localhost:8888/jupyter/api/kernels
                </Location>
                <Location "/jupyter/terminals/websocket">
                        ProxyPass        ws://localhost:8888/jupyter/terminals/websocket
                        ProxyPassReverse ws://localhost:8888/jupyter/terminals/websocket
                </Location>


                #ProxyPass /jupyter/api/kernels/ ws://127.0.0.1:8888/jupyter/api/kernels/
                #ProxyPassReverse /jupyter/api/kernels/ ws://127.0.0.1:8888/jupyter/api/kernels/

                #ProxyPass /jupyter http://localhost:8888/jupyter connectiontimeout=15 timeout=30
                #ProxyPassReverse /jupyter http://localhost:8888/jupyter

                ProxyPass /jupyter/tree http://127.0.0.1:8888/jupyter/tree
                ProxyPassReverse /jupyter/tree http://127.0.0.1:8888/jupyter/tree


                #ProxyPass /jupyter http://127.0.0.1:8888/jupyter/
                #ProxyPassReverse /jupyter http://127.0.0.1:8888/jupyter/


                #<Location ~ "/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)/?">
                #       ProxyPass ws://localhost:8888/jupyter
                #       ProxyPassReverse ws://localhost:8888/jupyter
                #</Location>

For more help, see here

Trouble-shooting

Can't apt-get update in Dockerfile:

Restart the service

sudo service docker restart

or add the file /etc/docker/daemon.json with the content:

{
    "dns": [your_dns, "8.8.8.8"]
}

where your_dns can be found with the command:

nmcli device show <interfacename> | grep IP4.DNS

Traceback of non zero code 4 or 128:

Restart service with sudo service docker restart

or add your dns address as described above

Elasticsearch crashes instantly:

Check permission of elasticsearch/data.

sudo chown -r USER:USER .
sudo chmod -R 777 .

or remove redundant docker installations or reinstall it

Error starting userland proxy: listen tcp 0.0.0.0:9200: bind: address already in use

Bring down other services, or change the hosts port number in docker-compose.yml.

Find all running services by:

sudo docker ps

errors while removing docker containers:

Remove redundant docker installations

"entire heap max virtual memory areas vm.max_map_count [...] likely too low, increase to at least [262144]"

Run on host machine:

sudo sysctl -w vm.max_map_count=262144