To run this code:

Run required servers in Local Mode

If you need kafka (and Zookeeper) run the class KafkaLocal setting the input parameter to "true" (the local instance of Kafka will be executed with Zookeeper)
If you need kafka and openTSDB then run the classes HbaseLocal and KafkaLocal setting the KafkaLocal's input parameter to "false"

Generate the avro class

Download avro-tools from http://mvnrepository.com/artifact/org.apache.avro/avro-tools/1.8.1 and exec:

java -jar path/to/avro-tools-1.8.1.jar compile schema src/main/resources/Event.avsc ./src/main/scala

Cloudera Manager Analysis

To check the number of messagges in kafka using Claudera Manager:

ClauderaManager -> Kafka -> Chart Library -> Topics (click on topic to analyze)

To check the number of consumed messages in kafka using Claudera Manager:

ClauderaManager -> Yarn -> WebUI -> ResourceManager WebUi -> ApplicationMaster -> Streaming

Packaging

Per generare il jar con tutte le dipendenze nella cartella ./lib

sbt universal:packageZipTarball Il tar si troverà nel path ./spark-opentsdb-examples/target/universal

Per aggiornare solo il jar senza dipendenze estrne

sbt clean package

Run consumer in spark

  spark-submit --executor-memory 1200M \
  --jars $(JARS=("$(pwd)/lib"/*.jar); IFS=,; echo "${JARS[*]}") \
  --driver-class-path /etc/hbase/conf \
  --conf spark.executor.extraClassPath=/etc/hbase/conf \
  --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/tmp/jaas.conf \
  --conf "spark.driver.extraJavaOptions="-Dspark-opentsdb-exmaples.hbase.master=eligo105.eligotech.private:60000 -Dspark-opentsdb-exmaples.zookeeper.host=eligo105.eligotech.private:2181/kafka-Dspark-opentsdb-exmaples.kafka.brokers=192.168.2.108:9092" \
  --master yarn --deploy-mode client \
  --keytab flanotte.keytab \
  --principal flanotte@SERVER.ELIGOTECH.COM \
  --class com.cgnal.kafkaAvro.consumers.example.OpenTSDBConsumerMain spark-opentsdb-examples_2.10-1.0.0-SNAPSHOT.jar  false flanotte.keytab flanotte@SERVER.ELIGOTECH.COM

spark-cdh-template

A spark sbt template that you can use for bootstrapping your spark projetcs: http://www.davidgreco.me/blog/2015/04/11/a-spark-sbt-based-project-template/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/