AbsaOSS/pramen

When 'trackDays=0', Pramen should never checks source data record count if the table is already loaded for the day

Closed this issue · 0 comments

Background

Snapshot tables are these that do not have the information date column. After such a table is loaded, it corresponds to the momentary snapshot of some operating data. After the table is ingested, it can still be changed at the source side.

For various reasons ingestion pipelines can run more than once per day. Currently, there is no way to specify that the table should not be checked for the current information date. But for snapshot tables this would be very helpful.

Feature

When 'trackDays=0', Pramen should never checks source data record count if the table is already loaded for the day.

Proposed Solution

When track.days = 0, never check the number of records for the partition that is already loaded.
When track.days = 1, check the number of records for the current info date only.
When track.days = 2, check the number of records for the current info date, and the day before only.

Basically, previous 'track.days = 1' now needs to be set to 'track.days = 2' to make is compatible.
When previous 'track.days = 0' now needs to be set to 'track.days = 1' to make is compatible.

'track.days = 0' now has the new meaning of never checking data that was already loaded.