Programmatic implementations of the Cloudera Envelope traffic sample in Scala, Java and Python.
Tested with: CDH 5.10 (Spark 1.6), Cloudera Kafka 2.1 (Apache 0.10), Kudu 1.2
-
Build from the project root directory:
mvn clean package
-
Create the target Kudu table using the Envelope traffic example Impala DDL script
-
Create the Kafka traffic topic (replication and partitions set to 1, for testing):
/usr/bin/kafka-topics --create --zookeeper curtis-pa-2.vpc.cloudera.com:2181 --replication-factor 1 --topic traffic --partitions 1
-
Produce simulated data on the topic (replace the kafka broker/port list parameter):
while true; do echo "`date +%s%N | cut -b1-13`,$((RANDOM % 100))"; sleep 1; done | /usr/bin/kafka-console-producer --broker-list curtis-pa-1:9092 --topic traffic
-
Run either the Scala, Java or Python Spark Streaming application (replace kafka brokers and kudu masters parameters):
spark-submit --class com.cloudera.fce.curtis.spark_stream_to_kudu.KafkaToKuduJava target/spark_stream_to_kudu-1.0-jar-with-dependencies.jar kafka-broker-1:9092,... kudu-master-1:7051,...
spark-submit --class com.cloudera.fce.curtis.spark_stream_to_kudu.KafkaToKuduScala target/spark_stream_to_kudu-1.0-jar-with-dependencies.jar kafka-broker-1:9092,... kudu-master-1:7051,...
PySpark: after building Scala/Java code in step 1, a kudu-spark_2...jar file should be available, typically under your ~/.m2 path
spark-submit --jars /var/lib/hadoop-hdfs/.m2/repository/org/apache/kudu/kudu-spark_2.10/1.2.0-cdh5.10.0/kudu-spark_2.10-1.2.0-cdh5.10.0.jar src/main/python/kafka_to_kudu.py kafka-broker-1:9092,... kudu-master-1:7051 ,...
-
View the results in Kudu from Impala:
select * from traffic_conditions order by as_of_time;