cal-itp/calitp-py

Add assertion / validation that all files returned from fetch_all_in_partition are actually in the partition

Opened this issue · 0 comments

While testing cal-itp/data-infra#1696, I created an edge case that we probably want to validate against: if you copy a valid GTFS schedule extract from its original location in partition dt=A to a new location in partition dt=B and don't update the GCS metadata on the file, you will get the extract returned in fetch_all_in_partition(dt=B) but the actual file will be read from its original location in dt=A based on the dt attribute in the metadata.

Basically we want to implement something like:

extracts = fetch_all_in_partition(dt="day")
unexpected_dts = {extract.dt for extract in extracts} - {day}
assert not unexpected_dts, f"Found unexpected dates: {unexpected_dts}"

Within storage.fetch_all_in_partition, but obviously made generic (not dt-specific).