databricks/spark-perf

Interpreting Spark-perf results

yodha12 opened this issue · 1 comments

I just started using Spark-perf-master and I am running pyspark tests only. After the run it prints output in the result folder. But I don't clearly understand what those numbers means. For example,

python-scheduling-throughput, SchedulerThroughputTest --num-tasks=5000 --num-trials=10 --inter-trial-wait=3, 2.505, 0.145, 2.383, 2.789, 2.460

python-agg-by-key, AggregateByKey --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 , 28.7235, 0.203, 28.461, 29.106, 28.537

What doest it mean by numbers 2.505, 0.145 etc for the first pyspark job and 28.7235, 0.03 etc for the second job.

See

return (result_med, result_std, result_min, result_first, result_last)
:

    return (result_med, result_std, result_min, result_first, result_last)