Interpreting Spark-perf results
yodha12 opened this issue · 1 comments
I just started using Spark-perf-master and I am running pyspark tests only. After the run it prints output in the result folder. But I don't clearly understand what those numbers means. For example,
python-scheduling-throughput, SchedulerThroughputTest --num-tasks=5000 --num-trials=10 --inter-trial-wait=3, 2.505, 0.145, 2.383, 2.789, 2.460
python-agg-by-key, AggregateByKey --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 , 28.7235, 0.203, 28.461, 29.106, 28.537
What doest it mean by numbers 2.505, 0.145 etc for the first pyspark job and 28.7235, 0.03 etc for the second job.
See
spark-perf/lib/sparkperf/utils.py
Line 41 in 79f8cfa
return (result_med, result_std, result_min, result_first, result_last)