Stream Processing with Apache Flink

This repository contains the code for the book Stream Processing: Hands-on with Apache Flink.

Table of Contents

  1. Environment Setup
  2. Register UDF
  3. Deploy a JAR file

Environment Setup

In order to run the code samples we will need a Kafka and Flink cluster up and running. You can also run the Flink examples from within your favorite IDE in which case you don't need a Flink Cluster.

If you want to run the examples inside a Flink Cluster run the following command to start the services.

docker-compose up

When the cluster is up and running successfully run the following command for redpanda:

./redpanda-setup.sh

or this command for kafka setup

./kafka-setup.sh

Register UDF

CREATE FUNCTION maskfn  AS 'io.streamingledger.udfs.MaskingFn'      LANGUAGE JAVA USING JAR '/opt/flink/jars/spf-0.1.0.jar';
CREATE FUNCTION splitfn AS 'io.streamingledger.udfs.SplitFn'        LANGUAGE JAVA USING JAR '/opt/flink/jars/spf-0.1.0.jar';
CREATE FUNCTION lookup  AS 'io.streamingledger.udfs.AsyncLookupFn'  LANGUAGE JAVA USING JAR '/opt/flink/jars/spf-0.1.0.jar';

CREATE TEMPORARY VIEW sample AS
SELECT * 
FROM transactions 
LIMIT 10;

SELECT transactionId, maskfn(UUID()) AS maskedCN FROM sample;
SELECT * FROM transactions, LATERAL TABLE(splitfn(operation));

SELECT 
  transactionId,
  serviceResponse, 
  responseTime 
FROM sample, LATERAL TABLE(lookup(transactionId));

Deploy a JAR file

  1. Package the application and create an executable jar file
mvn clan package
  1. Copy it under the jar files to be included in the custom Flink images

  2. Start the cluster to build the new images by running

docker-compose up
  1. Deploy the flink job
docker exec -it jobmanager ./bin/flink run \
  --class io.streamingledger.datastream.BufferingStream \
  jars/spf-0.1.0.jar