Add assertion / validation that all files returned from fetch_all_in_partition are actually in the partition
Opened this issue · 0 comments
lauriemerrell commented
While testing cal-itp/data-infra#1696, I created an edge case that we probably want to validate against: if you copy a valid GTFS schedule extract from its original location in partition dt=A
to a new location in partition dt=B
and don't update the GCS metadata on the file, you will get the extract returned in fetch_all_in_partition(dt=B)
but the actual file will be read from its original location in dt=A
based on the dt
attribute in the metadata.
Basically we want to implement something like:
extracts = fetch_all_in_partition(dt="day")
unexpected_dts = {extract.dt for extract in extracts} - {day}
assert not unexpected_dts, f"Found unexpected dates: {unexpected_dts}"
Within storage.fetch_all_in_partition
, but obviously made generic (not dt
-specific).