anhnongdan/Spark1.6_Problems

Performance of pySpark-tiny kernel

Closed this issue · 1 comments

  • Driver fails to load the entire 1 file of tc_call_histories for 1 day (need to use broadcast join)
  • executor fail to load 10 days of tc_call_histories -> try with loop

The reading is successful with loop.
Though execute time is pretty long.

=> To prevent interruption, write the intermediate file to disk/