AbsaOSS/pramen

Implement a mode of running multiple pipelines sequentially without re-creating the Spark Session

Closed this issue · 0 comments

Background

Submitting a Spark job is a process that takes time on any clusters. Sometime when pipelines are quite small you want to group and run them one by one with a single Spark Session. This saves time for starting and stopping the Spark Application.

Feature

Implement a mode of running multiple pipelines sequentially without re-creating the Spark Session.

An option --workflows can be introduced in contrast to the existing --workflow

Example

spark-submit pramen-runner.jar --workflows pipeline1.conf,pipeline2.conf,pipeline3.conf

Additional context

This way of running of pipelines assumes:

  • Pipelines are fully independent, including bookkeeping and email options
  • Notifications are sent separately for each pipeline
  • If one pipeline fails, others will still run