Udacity Data Streaming Nanodegree Program: SF Crime Statistics with Spark Streaming
-
How did changing values on the SparkSession property parameters affect the throughput and latency of the data?
- Answer - It may affect number of processed micro-batches recived
processRowsPerSecond
but we can tunemaxOffsetPerTrigger
andmaxRatePerPartition
- Answer - It may affect number of processed micro-batches recived
-
What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?
- Answer -
spark.default.parallelism
andspark.streaming.kafka.maxRatePerPartition
based on monitoringprocessedRowsPerSecond
.
- Answer -