anhnongdan/Spark1.6_Problems

Working with long process

Opened this issue · 1 comments

  • Persist the intermediate data
  • Record profiling.

When persisting intermediate data.

We can modular each stage of data processing.
Data loading take quite some time.
So we can work on multiple modules at one.
=> While data is loading, we can do smt with the calculation or model.