google/transit

Missing functionality to define "conceptual grouping of stops/stations" in existing GTFS

Opened this issue · 30 comments

Context

Recently, community members from Asia, Europe, and North America have pointed out the current GTFS missing functionality of defining conceptual grouping of stations/stops (or lacking "another-level" of hierarchy of stations/stops (Station cluster))

Here are some relevant discussion records recently:

Members from the Association for Open Data of Public Transportation [ODPT, JP] and the Japan Bus Information Association have also brought up this issue to us.

Issue

According to the current specification, location_type=1 is "station" (a physical structure or area that contains one or more platforms). This hierarchy doesn't provide functionality for defining station/stop grouping that riders conceptually think of it.

Based on the information we've gathered, having a new mechanism could be beneficial for the following use cases:

  1. Directing via station entrance or directing via street networks
    For example, There are a train station and multiple adjacent bus stations on a public street, and they can share the same or similar station names. According to the current description of location_type=1, it seems these adjacent bus stops should not be included within the hierarchy of this train station (parent_station). When these bus stops are independent, consumers can correctly direct riders using the street network, rather than mistakenly guiding riders to use entrances(and pathways). However, this approach may have some drawbacks, such as:
  • Confusion in search results when bus stops share the same name as the train station.
  • Unable to set service alerts for the entire conceptual group.

    On the other hand, some producers may include these bus stops within the hierarchy of the train station. However, doing so will lead consumers to give riders incorrect directions (using entrances and pathways).

    The current GTFS lacks a mechanism to define this conceptual grouping while also able to distinguish whether riders need to pass through an entrance. This could potentially be achieved by adding a field or introducing a higher-level hierarchy (station cluster).
  1. Searching and visualization
    Considering another example: there are multiple adjacent heavy rail stations with their own physical structure and entrances. They serve different routes/agencies and have the same station names. The current GTFS cannot define this station "cluster" that have a higher-level conceptual grouping. When riders search for this station name, consumers may provide them with an incorrect station, causing confusion.

    In this particular example, a higher-level of hierarchy (station cluster) seems necessary if we want to group these stations together, since each of them also serves as a parent station for their own entrances and stops(platforms).

  2. Service alerts
    Producers can directly set service alerts for the entire conceptual group, such as the entire train/subway station complex or multiple adjacent bus stops with identical station names.

  3. Effective Fare Leg
    When the fares v2 working group discussed rules for treating two or more legs as a single leg for fare calculation purposes, we considered how to formulate station-related semantics, how to include stop pairs, and different modes. A functionality of defining conceptual grouping might help to formulate fare_leg_join_rules.

Please share any examples for the above use cases, any other relevant use cases, which use cases are more critical for your region, or any other thoughts here.

The above comment states

location_type=1 is "station" (a physical structure or area that contains one or more platforms). This hierarchy doesn't provide functionality for defining station/stop grouping that riders conceptually think of it.

There might be arguments on this statement as well. (E.g. the word "area" implies it can support conceptual stations/stops grouping?) We would also like to learn how the community interpret the current location_type=1.

I've experienced variations of this problem before and while the extra layer of hierarchy seems tempting, my favourite solution would be to widen the definition of "station" to include anything that a producer sees as a "group". That definition may vary from place to place and producer to producer. Personally, I've started to interpret the "area" part of the definition that way for a while.

For example, some places may want to place two bus stops on either side of the street servicing the same routes as a station so they can send service alerts this group.

The problem with the pathways/entrances I would solve by explicitly adding pathways from inside the physical rail station to the bus stops outside.

Why isn't the extra layer of hierarchy my preferred solution? Netex has this and I've rarely seen usecases that were relevant for a passenger.

Why isn't the extra layer of hierarchy my preferred solution? Netex has this and I've rarely seen usecases that were relevant for a passenger.

If you model different modes in different stations, like the Germans do, it is something you cannot avoid. That this practise doesn't happen in NSR, that is because the model is too simplistic.

I would consider it be a "point of interest". Let's say there is "Vienna Airport" but it has separate bus and rail stations having each unique ID in the station registries and slightly different names. Each of these stations further has stops to represent platforms or bus stops.

I have thought about this issue too. In Stockholm, Sweden, the regional GTFS data for the operator SL contains three different parent_stations with the same name and different stop IDs.

Station ID A, Slussen contains ferry stops
Station ID B, Slussen contains bus stops
Station ID C, Slussen contains the metro

All of these IDs should, by local knowledge, be considered the same POI: Slussen. The transfer times between the different stations may be 10+ min, but when searching for Slussen, one expects an union of the three stations to be presented.

@leonardehrenfried My concern with simply clarifying an expanded definition of a parent station is the guidance for pathways that "[p]athways must be defined exhaustively in a station". This would suggest that pathways would also be needed from bus stops outside a station building to the rail platforms inside, which can result in incorrect journey planning.

At the @mbta, our current practice for bus stops adjacent to rail stations is: Leave the bus stop out of the parent station if it is on the side of a public street, but include the stop in the parent station if it is in a dedicated busway (with no mixed motorized traffic). Already this leads to suboptimal trip planner behavior for bus journeys originating at stations with busways. Take the example of our station Malden Center:

Malden Center outline Malden Center trip plan

In the first image above, the ground floor of the station building and indoor waiting area (the rail tracks are elevated) is outlined in green, with the entrances to the building in purple. The three bus stops are all considered part of the rail parent station, though they are also alongside the neighborhood sidewalk network.

In the second image above, a trip from the nearby neighborhood onto a bus gets routed via the nearest station entrance, making for a circuitous journey.

Shifting the entrance locations to be along the street-side sidewalks would help, but would also then deviate from our riders' colloquial understanding of the station entrances. Perhaps, as a compromise, there is some designation we could add onto stops or pathways to say that using pathways is not required to access the stop?

I have been thinking about the examples presented here and I'm kinda shifting my position.

The pathway argument in the last post and the Vienna airport example are good arguments for having an additional layer of hierarchy. I like the POI framing which is detached from any physical infrastructure.

This is something that would be useful for the New York City Subway as well. We have a multi-level hierarchy of station complexes and stations which cannot be fully modeled in GTFS. In our static GTFS at present, for every "station" (which is a somewhat abstract concept which does not precisely map on to real-world expectations of what a station is), we define the parent station and two child stops, representing the northbound and southbound platforms (an abstraction which breaks down at stations with more complex platform layouts, but that's for another discussion).

However, from a business domain perspective, the hierarchy does not end there. There are three locations where a single "station" actually has two GTFS parent station IDs (West 4 St, 145 St, and Queensboro Plaza), and then these stations can be aggregated into "station complexes" at some locations.

image

(Note that this depicts an arbitrary scenario that does not exist in real life at any station complex; it is merely intended to show the hierarchy of entities and how they nest within each other.)

At present we maintain several control files external to GTFS (specifically our Stations, Station Complexes, and Stations and Complexes Open Data datasets) which define these higher-level entities; third-party consumers can consume this information at their discretion but it is not as readily available nor as legible as it would be were it to be integrated directly in our GTFS.

(Yes, we could "solve" this by transforming how we model stations and complexes in GTFS to flatten out the hierarchy, but this would have complex interactions with many internal and external systems and it is unclear whether such an undertaking would even be feasible.)

Regarding your description, it feels to me that the GTFS Station is a grouping of the conceptual stop area (stops in opposite directions) while the station complex was the original intention for the GTFS station.

Bumping this thread because I missed responding in the first round 😇

I'll start by saying that I ultimately think some additional layer of stop-station hierarchy may be inevitable, but I already have opinions on how it might be used (or abused).

My high-level thesis is that stop-station hierarchy should attempt to model a rider's conception of a stop, station, or complex first-and-foremost. I'm reluctant to add hierarchy just for the purpose of matching an agency's internal control structures. If there are business rules around transfers, fares, or walking directions that can currently only be handled by modeling stations in a way that doesn't match a rider's mental model, then I think we should address those cases separately.

For example, per @tzujenchanmbd motivating issue #1 and per @jfabi 's example above, I agree that there is an issue with bus stops at the perimeter of larger indoor station complexes with entrances. But if riders conceptually think of these perimeter bus stops as part of the larger station complex, then I'd argue we keep the existing stop-station hierarchy mapping (aka don't introduce additional hierarchy), but look for other mechanisms to get the walking right. For this specific case, we've been recently floating around Outdoor Bus Stops on the Perimeter of a “Heavy” Station - stops.txt “stop_access” proposal, developed in response to my original thread #1 that @tzujenchanmbd linked above.

When does additional station hierarchy make more sense? I think a good test is when that additional level of station hierarchy has a name that riders would recognize. This ties into @tzujenchanmbd 's motivating issue #2 and @jspetrak 's
suggestion of something akin to a POI. Here I'm thinking of the Grand Central Terminals of the world, where there is one high-level POI that most people can identify, but then there are individual substations within the larger parent station that riders would also identify and search for.

I agree with @bdferris-v2 that modelling this should be done from a riders perspective, not implementation needs.

I wish NeTEx and GTFS was more aligned, so I will take the opertunity to explain NeTEx here:

  • GroupOfStations with a purpose (GENERALIZATION or CLUSTER is used in Norway). GENERALIZATION is a group of prominent stop places within a town or city. Smaller stops are excluded, because we do not want the riders to end up at these stops e.g. in a search. CLUSTER - stop places in proximity to each other which have a natural geospatial- or public transport related relationship. These are stops/stations who share a name known to many riders. Note! A GroupOfStation is a not a Station.
  • MultiModalStation - in NeTEx (at least nordic profile) a Station can not serve more than one mode, so a MultiModalStation is simply a station of stations with more than one mode - I think this is a design mistake - but I can see that it in many cases adds value. Mostly used to display information/icons in maps and so on. A good thing about MultiModalStation is that is allow 2 levels for complex stations and it is clear how to model it - avoiding different classification rules in different places. It does more or less also follow public naming and mental models of complex stations - the "train and busses at the central station".

For our @mbta use case of stops around stations, we would be in support of @bdferris-v2's stop_access proposal or similar.

I could see us potentially wanting to treat these "exterior" stops slightly differently on our website or information screens (for instance, showing directions, child stop name, etc), but this field would let us differentiate them and add any desired implementation.

--

Separately, we do, as an agency, have an interest in grouping geographically-related stops outside of stations (think of two bus stops across the street from each other). We have a very rough, distance-based method that can return stops in a range, but @t2gran's mention of NeTEx's GroupOfStation concept is intriguing as well.

I am a staff member of ODPT in Japan. Thank you very much for creating this issue, @tzujenchanmbd .
Sorry for the very late comment. I would like to add some background to the discussion in Japan.
In fact, the case discussed in Japan is the issue of bus stop grouping. (I think it's almost the same problem as the one mentioned as Thread #2 in gtfs slack channel.)

I think this is probably a similar situation in other countries, but many bus routes have bus stops facing opposite directions on both sides of the street.
Furthermore, for buses running in urban areas, there are often many bus stops that share the same name and are located along different roads around a single intersection.
The bus route operator treats these bus stops as "different poles at the same bus stop" and guide users by grouping them together.

A bus terminal near a station would have a physical structure, but in this case there is no such thing.
According to the GTFS location_type=1 specification, it seems inappropriate to group these.

I personally think a potential need in Japan is to find out how to appropriately group such bus stops.

Through the discussion of this issue, I understood that there are more complex cases.
However, for now in the Japanese case, we are more interested in figuring out how to group bus stops without physical structures, rather than modeling the structure around stations.

@iniad-bessho About grouping bus stops: the spec says "Station. A physical structure or area that contains one or more platform."

I've started to interpret "area" to mean exactly what you describe: a group of stops that "belong together" but they don't have any physical infrastructure linking them together.

@leonardehrenfried

Thank you for your comment.
Indeed, if we consider an area without a physical structure to be a station, it seems possible to apply this to grouping of bus stops as well.

However, it seems clear that the treatment of pathways would be different between a normal station and this case.
I personally think that the idea of ​​making a distinction with an additional field, as discussed above, is also reasonable.

I am a board member of Japan Association for Bus Digitalization.
Around 2021, we were discussing this topic with Google and domestic stakeholders.
While we agree with the idea of "conceptual grouping of stops," we have some differing opinions.

Problems

Because station (location_type=1) is limited to physical structure or area, following problems arise:

  1. Grouping
    Data users need to group outdoor stops themselves based on ID regularity, name matching, proximity, etc.

  2. Searchability
    End users need to select from a multiple stops with the same name included in the same station when searching for bus stops.

  3. platform_code
    Data creators cannot set platform_code to outdoor stops because platform_code can only be set to "a stop belonging to a station."

  4. Representative Coordinates
    Since the parent stop is undefined, the coordinates to be displayed when zooming out on a map are unknown.

Proposal

  1. Make station(location_type=1) applicable to conceptual groups as well.
  • Clarify the definition of a station, for example, as "a physical structure, a physical area, or a conceptual group."

If 1. is not acceptable,

  1. Add a new location_type as Cluster
  • Make stop_name, stop_lon, and stop_lat mandatory.

Whether 1 or 2 is adopted, the following rules should be established:

  1. Only allow a two-level hierarchy of parent and child.

  2. Best practices should state that the parent icon is displayed when the map is zoomed out, and the child icon is displayed when zoomed in.

Opinions

Multi-level hierarchy is unnecessary

  • The complexity of data processing is a significant drawback.

    • Recursive processing would be required, making it difficult to handle with SQL and other tools.
    • During discussions with Google and Japanese stakeholders, we also requested that the two-level hierarchy be maintained.
  • Since GTFS is often created by individual operators, data creators do not need a multi-level hierarchy. They prefer simple specifications.

  • Grouping stops of complex stations should be done by service providers, not data creators.

    • For example, Haneda Airport has multiple stations from different operators. It's practical for service providers to integrate or associate them according to their use cases.
      • Airport: Haneda Airport
      • Monorail: Haneda Airport Terminal 1, Haneda Airport Terminal 2, Haneda Airport Terminal 3
      • Train (Keikyu): Haneda Airport Terminal 1/2, Haneda Airport Terminal 3
      • Bus (Multi agency): Haneda Airport Terminal 1, Haneda Airport Terminal 2, Haneda Airport Terminal 3

It's better not to create a new location_type (Cluster)

  • What constitutes physical is ambiguous, and making this judgment is cumbersome for data creators.
  • Both data creators and users would need to make modifications.
  • Google had accepted to set up parent stations on outdoor bus stops until about 2019.

Direct accessibility is defined by pathway and entrance/exit

  • If a stop is connected to an entrance/exit by pathway, access from the street is restricted.
  • Otherwise, it is either accessible from the street or undefined.

We are in the process of migrating to OTP and therefore GTFS. There is one station in particular, the central station of Hamburg, which has various individual stations that are recognizable to riders: Hamburg Hbf, Hauptbahnhof Nord, Hauptbahnhof Süd, Hauptbahnhof/ZOB, Hauptbahnhof/Mönckebergstraße, Hauptbahnhof/Kirchenallee.

These stations appear with their names in the itineraries of the trips.

But it does make sense to start your search only in "Hauptbahnhof" or end it there because you don't care which one of these stations is your start or destination. For this purpose, we are currently using a meta station "Hauptbahnhof" which is equivalent unidirectionally to all the stations. In our current journey planning, if you search from this station to another place, any of the starting stations is ok and you won't have a footpath.

If you search from Hamburg Hbf, then departures from other stations can still be found but will include a footpath from the Hamburg Hbf station to the other one, making this departure more expensive.

So far, we haven't found a way to represent this in GTFS and subsequently OTP. A grouping of stops/stations would make sense in that regard.

We are in the process of migrating to OTP and therefore GTFS.

OTP supports NeTEx too ;-)

I'm aware. We initially looked at NeTEx but its complexity and difficulty to understand are the reason why we use GTFS.

I'm aware. We initially looked at NeTEx but its complexity and difficulty to understand are the reason why we use GTFS.

I really don't understand, if you are in Germany and get NeTEx at the National Accesspoint / Delfi (for free), why would you first go through the GTFS route. https://www.opendata-oepnv.de/ht/de/willkommen

It's because its quality is really not good. You're better off not using it.

It's because its quality is really not good. You're better off not using it.

Citation needed.

One thing for sure: OpenTripPlanner is completely 'tuned' towards the Nordic NeTEx Profile.

My colleague Holger collects issues with the GTFS version of the feed here: https://github.com/mfdz/GTFS-Issues/issues?q=is%3Aopen+is%3Aissue

There is no reason to believe that the NeTEx version is any better.

A nicely done NeTEx feed is a wonderful thing but Delfi isn't one of those.

Exactly my point. GTFS build by Delfi is exactly of the same source as the NeTEx publication, an IVU.cloud system right?
The NeTEx file does have a stop hierarchy.

But considering the above, why would @2martens be better off with GTFS then?

Both GTFS and Netex are bad.

Yes, which is why we are building our own GTFS feed directly from the ISA file format that we get from the Hochbahn. There is a difference between GTFS as a file standard and the specific feed available from Delfi. And as a standard GTFS is miles easier to understand and produce than NeTEx.

And in that capacity of building a GTFS feed, the conceptual grouping of stations would be helpful, to get back to topic.

In Japan's case, the missing functionality to properly group a set of bus stops (without physical structure) has been one of the main focuses of discussion.
Through the discussion of this issue, I understood that "conceptual grouping of stops/stations" includes multiple cases.

In my personal opinion, it seems appropriate to separate the following two discussions:

  • How to logically group multiple stops
  • How to introduce a higher-level hierarchy (station cluster)

The former discussion would be resolved if some specification change allows stations (location_type=1) to be explicitly applied to an area without a physical structure.
(If so, it seems the interpretation of pathways should also be modified.)
In this case, I agree that a multi-level hierarchy should be avoided.

The latter discussion seems necessary when creating a large-scale feed that includes multiple operators, which is not yet common in Japan, where feeds are divided by operator.

Thanks for all the comments over the past few months!

Based on the discussions, we can observe the following:

  • The current definition of station (location_type=1) - "A physical structure or area" has led to ambiguity. Some interpret it as allowing "conceptual grouping" (with potentially different uses for grouping), while others interpret it as only allowing grouping within a single physical structure, etc.
  • Due to the current lack of a mechanism to explicitly define whether a stop needs to pass through an entrance, allowing the stop-station hierarchy to be used for "conceptual grouping" under the current pathways semantics may lead to routing problems. (#1 @jfabi & #2 @bdferris-v2 )
  • There is a need for "conceptual grouping." (such as bus stops at the perimeter of a heavy rail station & bus stops group in Japan @iniad-bessho @takohei )
  • There is a need for a higher layer of hierarchy. (such as high-level POI @jspetrak & Hamburg Central @2martens )

However, given the current ambiguity surrounding the stop-station hierarchy, immediately introducing a higher level may exacerbate the ambiguity issue. Therefore, I propose a 3-phase plan as follows:

Phase 1 - Officially adopt the stops.stop_access proposal. This can resolve the aforementioned routing issues and "explicitly" open up the possibility of using the stop-station hierarchy for conceptual grouping not limited to within single physical structure.

Phase 2 - MobilityData and the community collaboratively create a station modeling guide, which will include various real-world use cases, images, and data examples of how to model using the stop-station hierarchy (including the new stop_access field).
Use cases could include bus stops on opposite sides of the road with the same name, indoor or outdoor bus terminal, bus stops at the perimeter of a heavy rail station, cluster of heavy rail stations with the same or similar names, etc.

Phase 3 - Once the community reaches a consensus on the use of the stop-station hierarchy and there is a guide document in place, consider introducing a higher layer of hierarchy. (Minimize the potential ambiguity of this higher layer)

Happy to hear any thoughts/comments on this plan.

Hello everyone, and thank you for these constructive exchanges.

We are facing exactly the same questions in Paris/France regarding the generation of GTFS in open data. For several years, we have interpreted the concept of "parent_station" as being "conceptual" rather than "physical". The addition of "areas" and "location" files has recently questionned us.

The proposed plan seems very good to me: it remains backward compatible, it starts by resolving ambiguities, and will eventually allow for new features.
For your information, in Paris, we have linked "Entrance/Exit" (location_type=2) to "stop/platform" (location_type=0): this model allows us to manage all the mentioned cases (no connection with bus stops, for example. But also many entrances/exits reserved for specific platforms within stations).

Thank you again for reading this thread!

Thank you very much for proposing the 3-phase plan.
First of all, we totally agree with clarifying the definition of the station.

Regarding Phase 1,
it would be great if logical groupings were explicitly allowed in the GTFS specification, by accepting the stop_access proposal.

Regarding Phase 2,
we would like the use case to include several practical cases of bus stop grouping.
The cases may include:

  • Bus stops on opposite sides of the road
  • A bus terminal with multiple bus stops nearby a train station
  • Three or more bus stops that share the same name and are located nearby (e.g. around the same interaction)