- Real-time Spark Stream Monitoring over SocketStream.
- Using Apache Spark and Lightning Graph server.
ApacheSpark 2.x streaming application with Dataset’s is not supporting streaming
tab now. This project shows how to have a realtime graph monitoring system using Lightning-viz where we can plot and monitor any custom param that we need.
There are 3 main components in this project as shown in the picture below:
- SparkApplication: Spark application receives streaming data from a socket stream and it does simple job of word count.
- Lightning Server: Plots live-stats of any custom params that user wants to monitor within his spark application real-time.
- StreamingListener: Registered a custom streaming listener to post live-stats to LightningServer.
Following picture depicts side-by-side view of spark-metrics page and its corresponding processing time taken per batch
and number of records per batch
params graph plotted live
This project is using mvn, scala 2.11, spark 2.x and java 1.8.
$ mvn clean install
First of all, the application depends on Lightning Graph Server. The default server is http://localhost:3000. You can or Install on your machine. Good part, is installing it is very simple (kinda one-click process).
Once lightning server is up & running, We can start our spark application in either of the 2 ways listed below:
- standalone jar
$ scala -extdirs “$SPARK_HOME/lib" <path-to-spark-streaming-monitoring-with-lightning.jar> --master <master> <cmd-line-args>
- spark-submit
$ spark-submit --master <master> <path-to-spark-streaming-monitoring-with-lightning.jar> <cmd-line-args>
Default value for master is local[2].
Optionally, you can provide configuration params like lightning server url etc from command line. To see the list of configurable params, just type:
$ spark-submit <path-to-spark-streaming-monitoring-with-lightning.jar> --help
OR
scala -extdirs “$SPARK_HOME/lib" <path-to-spark-streaming-monitoring-with-lightning.jar> -h
Help content will look something like this:
This is a Spark Streaming application which receives data from SocketStream and does word count.
You can monitor batch size and batch processing time by real-time graph that's rendered using
Lightning graph server. So, this application needs lightningServerUrl and SocketStreamHost
and Port from where to listen to..
Usage: spark-submit realtime-spark-monitoring-with-lightning*.jar [options]
Options:
-h, --help
-m, --master <master_url> spark://host:port, mesos://host:port, yarn, or local.
-n, --name <name> A name of your application.
-ssh, --socketStreamHost <hostname> Default: localhost
-ssp, --socketStreamPort <port> Default: 9999
-bi, --batchInterval <batch interval in ms> Default: 5
-ls, --lightningServerUrl <hostname> T Default: http://localhost:3000
Default values for all the options available from command-line are also present in configuration file. You can directly tweak the file instead of submitting it every time from run/submit command. You can find config file at /src/main/resources/dev/application.properties. Following lists the params listed in the file:
...
sparkMaster=local[2]
socketStreamPort=9999
socketStreamHost=localhost
appName=sparkmonitoring-with-lightning
batchInterval=5
lightningServerUrl=http://localhost:3000
...