TheDataLeek/HadoopInspector

add: distinction in results-db between run timestamp & data timestamp

Closed this issue · 1 comments

Right now the results database only knows when a check was run - not what time period the check was run against.

Add another set of timestamps for the start & stop time the check was run against. This is particularly important when incremental testing is being performed.

defines when process ran:

  • run_start_timestamp
  • run_stop_timestamp

defines period being checked:

  • if mode == 'full', then it's all data, up to the runtime, more or less
    • because they could have data for the future in the table
    • or because maybe there's no data for the past 3 weeks in the table
  • if mode == 'incremental', then it's the partitioning date used by the incremental checks
    • but there's a lot of different incremental date formats
    • maybe the setup should also provide a start & stop regular timestamp when it provides incremental checks?
  • data_start_timestamp
  • data_stop_timestamp

Added in 0.1.5