Spark task that extracts, categorises and builds elastic index for events data crawled with events-crawler.
Create tasks directory at master node
cd $SPARK_HOME
mkdir tasks
Include HBase jars on the Spark class path. Add following line to the conf/spark-defaults.conf
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hbase/lib/*
Set $ZOOKEEPER_QUORUM and $ES_NODES for Spark executor worker nodes
export ES_NODES=vps1234.net:9200
export ZOOKEEPER_QUORUM=vps1234.net,vps5678.net.net
Build the project and copy jar to the server
cd $EVENTS_HOME
sbt clean assembly
scp target/scala-2.10/events-fetcher-assembly-0.0.1.jar user@host:/$SPARK_HOME/tasks
Submit spark task. Need to supply path to the hbase lib directory to put hbase jars on driver path
./bin/spark-submit --class uk.vitalcode.events.fetcher.Client \
--master yarn-client \
--num-executors 1 \
--driver-memory 300m \
--executor-memory 300m \
--executor-cores 1 \
--queue thequeue \
--driver-class-path /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hbase/lib/*: \
tasks/events-fetcher-assembly-0.0.1.jar
sbt test