cal-itp/data-infra

Open data publishing (GTFS): Handle file-level errors so subsequent files can be attempted

Closed this issue · 0 comments

As a Cal-ITP data maintainer, I want the GTFS schedule open dataset to get updated to the greatest extent possible on each publish run so that our open data is up to date.

Currently, if one file fails (specifically, stop times tends to fail because of its size), then the subsequent files (in alphabetical order) are not even attempted. At time of writing, stop times has been failing since November 27, which means that stops, transfers, translations, and trips have not even been attempted since November 27 even though likely would have succeeded (since the issue seems to be specifically with the multi part upload for stop times).

Basically I think we should add some exception handling ~here: https://github.com/cal-itp/data-infra/blob/main/warehouse/scripts/publish.py#L478-L488 so that if one file fails we do continue trying to update the subsequent files.