/kafka-hdfs-zoomdata

Realtime hdfs sink and zoomdata post

Primary LanguageScala

kafka-hdfs-zoomdata

Realtime hdfs sink and zoomdata post

This application takes 6 parameters:

BROKERS: List of Kafka Brokers with port number and splited with comma

DESTINATION URL: Location to save the files. Can be any supported filesystem location: FILE, HDFS, Bigstep Datalake etc.

KAFKA TOPIC OFFSET: Where should the broker start reading events. Choose between: earliest and latest

OUTPUT FORMAT: What format should the data be written into. Available option: parquet

TIME: Number of seconds between succesive reads from the kafka topic.

The Zoomdata url, user and password must be configured in the app.

To compile use: sbt package

To submit this job use:

/opt/spark-2.1.1-bin-hadoop2.6/bin/spark-submit --master yarn target/scala-2.11/kafka-hdfs-zoomdata_2.11-1.0.jar BROKER_FQDN_1:9092,BROKER_FQDN_2:9092,BROKER_FQDN_N:9092 TOPIC_NAME /path/to/write/files OFFSET FORMAT TIME