New Rule Proposal: Individual scheduled trip_id not accounted for in Trip Updates feed
evansiroky opened this issue · 3 comments
Summary:
An error should be raised whenever an individual Trip that should be in service at the time a Trip Update feed was downloaded is not accounted for in any TripUpdate record in the Trip Update feed.
Steps to reproduce:
Given a TripUpdate dataset
and its associated GTFS Schedule dataset
When when the validator has compiled a list of all trips that should be currently in service
and has scanned through all TripUpdate entities in a Trip Updates feed
and does not find an individual Trip that was expected to be in service being accounted for in any TripUpdate record
Then the validator should flag this respective trip_id
in question for not having any corresponding TripUpdate entity describing its realtime status in the Trip Update feed.
Expected behavior:
The GTFS-Realtime Best Practices state:
Feeds should cover the vast majority of trips included in the companion static GTFS dataset. In particular, it should include data for high-density and high-traffic city areas and busy routes.
The GTFS Validator should flag all trip_id
s individually that should have been accounted for at the time that the trip should have been in service.
Observed behavior:
An error or warning is not raised for this problem at this time.
etc
This issue seeks to add more detailed scope to #119.
Is there an established method for determining if a trip should be currently in service? If no, would the correct approach be to see if the current time is between the first and last stop times for a trip?
would the correct approach be to see if the current time is between the first and last stop times for a trip?
Yes, this is what should be done, but the calendar files also need to be taken into account to make sure that the trip is supposed to be operating on the given day in question.
In reviewing this with @bdferris-v2, it seems it could be a fairly complicated implementation. Currently there is no easy way to find a list of trips that should be active from the static feed data. We would have to add code to build a more structured list of trips by service date, start time, end time, (and account for trips that span days/midnight) that could be more easily indexed for a real-time validation. Additionally he mentioned complications like the way Daylight Savings Time is handled in GTFS data (didn’t get into details here) as well as trips in blocks that can have cascading delays (a late trip can make the next trip in the block late).
This seems like it could be a significant effort that maybe should be broken into smaller chunks? Open to suggestions!