Processing checkpoints and destructive operations on Dataframe
Opened this issue · 0 comments
Zejnilovic commented
We need to come up with a way to process checkpoints between destructive operations. So if I do filtering of the data and lose some rows, I can flag the next checkpoint as "post-destructive", "deduplication" or something similar.
This will pay off a lot when we implement validation #95 and have the ability to attach the control measures onto data frame instead of spark session.