This repository contains the Apache Flink job used to enrich the sensor data stream as part of the WaterGridSense4.0 Analytics Platform.
This job requires a running instance of our Analytics Platform and a running HDFS cluster.
The job is configured by adjusting the settings in the properties file src/main/resources/enrichment_job.properties
. Note that all properties with capitalized values must be changed, e.g. replace KAFKA_HOST_OR_IP
, RABBITMQ_PASSWORD
etc. with the correct values for your cluster.
To build the pipelines, just use make
:
make
or use maven directly instead
mvn clean compile package
The target artifact will be created as EnrichmentJob/target/EnrichmentJob-1.0-SNAPSHOT.jar
.
Submit the compiled artifact to Flink. Use the Parallelism
setting to configure the cluster size in accordance with the number of Flink taskmanagers and Kafka brokers. Other parameters that can be passed to the job are:
--initialize
- initialize the Cassandra DB by creating the keyspace and parameter table--mqtt
- use RabbitMQ as data stream broker instead of Kafka