Data-Stack composed by Elastic Stack, Grafana, Jupyter, Spark and DB-adapter to stream data from a kafka broker
Based on the following Components:
The designated way to feed data into the DataStack is from the Apache Kafka message bus via the separate Kafka Adapter which is based on the components:
- Kafka Client librdkafka version 0.11.1
- python kafka module confluent-kafka-python version 0.9.1.2
- Install Docker version 1.10.0+
- Install Docker Compose version 1.6.0+
- Clone this repository
- Configured Docker Swarm
This repository is divided into a swarm path and compose path, where the compose path serves as a staging environment.
Start the Data-Stack in a local testing environment. Make sure the bash scripts are executable and you have write priviledges.
git clone https://github.com/i-maintenance/datastore/
cd datastore
./start_datastore_local.sh
To watch the output of the datastore:
./show_datastore_local.sh
To stop the container use this command:
./stop_datastore_local.sh
This section requires a running docker swarm
. If not already done, check out
this video tutorial
to set up a docker swarm cluster.
Start the Data-Stack using docker stack
on a manager node:
If not already done, start a registry instance to make the cumstomized jupyter-image deployable: (we are using port 5001, as logstash's default port is 5000)
sudo docker service create --name registry --publish published=5001,target=5000 registry:2
curl 127.0.0.1:5001/v2/
This should output {}:
Now, deploy the Datastore in swarm. Make sure the bash scripts are executable, you have write priviledges and the mounted volumes in the compose-file swarm/DataStack/swarm_docker-compose.yml
exist!
git clone https://github.com/i-maintenance/datastore/
cd datastore
./start_datastore_swarm.sh
To watch the status of the datastore: It may take several minutes for the services to be ready!
./show_datastore_swarm.sh
To stop the container use this command:
./stop_datastore_swarm.sh
Give Kibana a minute to initialize, then access the Kibana web UI by hitting
http://localhost:5601 with a web browser.
The indexing of elasticsearch could last 15 minutes or more, so we have to be patient.
On Kibana UI, DevTools we can trace the indexing success by hitting the REST request
GET _cat/indices
.
By default, the stack exposes the following ports:
- 5000: Logstash TCP input
- 9200: Elasticsearch HTTP
- 9600: Logstash HTTP
- 5601: Kibana: User Interface for data in Elasticsearch
- 3030: Kafka-DataStack Adapter HTTP: This one requires the db-adapter
- 8080: Swarm Visalizer: Watch all services on the swarm
- 8888: Jupyter GUI: Run Python and R notebooks with Spark support on elastic data,
the default password for Jupyter is
datastore
. Please change that using the config fileswarm/DataStack/jupyter/jupyter_notebook_config.py
Watch if everything worked fine with:
sudo docker service ls
sudo docker stack ps db-adapter
sudo docker service logs db-adapter_kafka -f
In order to feed the Data-Stack with data, we can use the Kafka-DataStack Adapter.
The Kafka-Adapter automatically fetches data from the kafka message bus on
topic SensorData. The selected topics can be specified in
.env
file of the Kafka-DataStack Adapter
To test the Data-Stack itself (without the kafka adapter), inject example log entries via TCP by:
$ nc hostname 5000 < /path/to/logfile.log
If used behind an apache2 proxy, make sure to enable additional moduls
sudo a2enmod ssl rewrite proxy proxy_http proxy_wstunnel
Use the following config (note that the notebook will be available in https:/url/jupyter)
RewriteRule ^/jupyter$ jupyter/tree/ [R]
RewriteRule ^/jupyter/$ jupyter/tree/ [R]
<Location "/jupyter">
ProxyPass http://localhost:8888/jupyter
ProxyPassReverse http://localhost:8888/jupyter
</Location>
<Location "/jupyter/api/kernels">
ProxyPass ws://localhost:8888/jupyter/api/kernels
ProxyPassReverse ws://localhost:8888/jupyter/api/kernels
</Location>
<Location "/jupyter/terminals/websocket">
ProxyPass ws://localhost:8888/jupyter/terminals/websocket
ProxyPassReverse ws://localhost:8888/jupyter/terminals/websocket
</Location>
#ProxyPass /jupyter/api/kernels/ ws://127.0.0.1:8888/jupyter/api/kernels/
#ProxyPassReverse /jupyter/api/kernels/ ws://127.0.0.1:8888/jupyter/api/kernels/
#ProxyPass /jupyter http://localhost:8888/jupyter connectiontimeout=15 timeout=30
#ProxyPassReverse /jupyter http://localhost:8888/jupyter
ProxyPass /jupyter/tree http://127.0.0.1:8888/jupyter/tree
ProxyPassReverse /jupyter/tree http://127.0.0.1:8888/jupyter/tree
#ProxyPass /jupyter http://127.0.0.1:8888/jupyter/
#ProxyPassReverse /jupyter http://127.0.0.1:8888/jupyter/
#<Location ~ "/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)/?">
# ProxyPass ws://localhost:8888/jupyter
# ProxyPassReverse ws://localhost:8888/jupyter
#</Location>
For more help, see here
Restart the service
sudo service docker restart
or add the file /etc/docker/daemon.json
with the content:
{
"dns": [your_dns, "8.8.8.8"]
}
where your_dns
can be found with the command:
nmcli device show <interfacename> | grep IP4.DNS
Restart service with
sudo service docker restart
or add your dns address as described above
Check permission of elasticsearch/data
.
sudo chown -r USER:USER .
sudo chmod -R 777 .
or remove redundant docker installations or reinstall it
Bring down other services, or change the hosts port number in docker-compose.yml.
Find all running services by:
sudo docker ps
Remove redundant docker installations
"entire heap max virtual memory areas vm.max_map_count [...] likely too low, increase to at least [262144]"
Run on host machine:
sudo sysctl -w vm.max_map_count=262144