google/transit

stop_times.shapes_dist_traveled shouldn't be defined if the trip doesn't have shapes associated

Opened this issue ยท 7 comments

Context

The spec definition of stop_times.shapes_dist_traveled says (reference):

Actual distance traveled along the associated shape, from the first stop to the stop specified in this record. This field specifies how much of the shape to draw between any two stops during a trip. Must be in the same units used in shapes.txt.

We've observed cases where a stop_times.shapes_dist_traveled is specified for trips that don't have a shape associated (no shape_id for the trip_id referenced in stop_times.txt).

trips.txt

route_id service_id trip_id shape_id
route_a regular trip_1
route_a regular trip_2
route_a regular trip_3

stop_times.txt

trip_id stop_id stop_sequence shape_dist_traveled
trip_1 stop_3 3 0
trip_1 stop_4 4 298
trip_1 stop_5 5 1029

Although this doesn't break anything, it seems like it's never intentional, and could potentially mean that a shape was intended to be associated.

Proposed solution

We would like to amend the specification with a "should" statement and add a check in the Canonical GTFS Schedule Validator with a WARNING severity level.

The new statement could look like this:

shape_dist_traveled should not be specified if the trip_id value does not have a shape_id defined in trips.txt.

Questions we'd like the community's input on before proposing a spec amendment

  • Are we in agreement that this should be flagged as a WARNING?
  • Do we also needshapes.shape_dist_traveled to be defined to make use of stop_times.shape_distance_traveled?

Nope. Because shape distance can also be used to describe the distance traveled for example for fare computation. An actual geographical shape is not required for this.

And second also nope. While it does improve linear referencing it is not required.

Thank you for this info, @skinkie!
If this is accurate, then we would still propose a spec amendment because currently, the spec implies there is a shape.

Curious to have input from others on this one, especially consumers & orgs that see lots of GTFS data (cc @npaun, @bdferris-v2, @flocsy, @drewda, @e-lo, @westontrillium)

@skinkie's use case for defining distance traveled without a shape is one I hadn't considered. Regardless, isn't there precedent to have validation warnings for things that do not inherently violate best practices/hard spec rules, and may in fact be purposive, but that should still be reviewed due to the likelihood of it being in error?

@westontrillium There's precedent, but it's been making the boundaries of the definitions unclear. We'd like to change this. There's an INFO severity outside of ERROR and WARNING that we've been using to flag possible issues with a feed that are not explicitly recommended or banned in the spec or best practices. Info notices here.

It sounds like this issue may be a good candidate for the INFO severity level, if there's alignment that it is not a "should" to have a shape_id for the trip_id referenced in stop_times.txt when the trip has an associated stops_times.shape_dist_traveled

INFO sounds like the perfect fit, in my estimation.

Hello,

We would like to amend the spec to better reflect what @skinkie highlighted. Here is a proposal:

Actual distance traveled along the trip, from the first stop to the stop specified in this record.
If used alongside shapes.txt, this field specifies how much of the shape to draw between any two stops during a trip and it must be in the same units used in shapes.shape_dist_traveled.

Thoughts?