AbsaOSS/pramen

Allow postprocessing as a part of a transformation or ingestion

Closed this issue · 0 comments

Background

Some metadata can be calculated only after the result of a transformer is saved.
Currently, transformer's run() method returns a dataframe. There is no way to add data quality checks to the transformer after the dataframe is saved, but create another transformer. To make things easier, a new method in Transfgormer interface is proposed.

Feature

Allow postprocessing as a part of a transformation.

Proposed Solution

For transformers:

/**
  * This method is called after the transformation is finished. You can query the output table form the output information
  * data and the data should be there.
  *
  * @param outputTableName The table name used as the output table of the transformer
  * @param metastore       The read only version of metastore. You can only query tables using it.
  * @param infoDate        The information date of the output of the transformation.
  * @param options         Extra options specified in the configuration for the transformation.
  */
def postProcess(outputTableName: String,
                metastore: MetastoreReader,
                infoDate: LocalDate,
                options: Map[String, String]): Unit = {}

For sources:

/**
  * This method is called after the ingestion is finished. You can query the output table form the output information
  * data and the data should be there.
  *
  * @param query           The query used to read the data from the source.
  * @param outputTableName The table name used as the output table of the ingestion.
  * @param metastore       The read only version of metastore. You can only query tables using it.
  * @param infoDate        The information date of the output of the ingestion.
  * @param options         Extra options specified in the operation definition for the ingestion.
  */
def postProcess(query: Query,
                outputTableName: String,
                metastore: MetastoreReader,
                infoDate: LocalDate,
                options: Map[String, String]): Unit = {}

The default empty implementation of the method in the trait should make the change non-breaking.