mfdz/GTFS-Issues

DELFI (and others): Use wikidata entity id as agency_id

Opened this issue · 4 comments

Current issue(s)
For all agencies in the DELFI GTFS feed, agency_url is set to https://www.delfi.de, for most of them further information (besides the name) is missing. Regarding the agencies IDs, it's unclear who maintains them and if they are stable accross different feed versions.

Enhancement/addition I'd like to suggest
As all of this information should be publicly available, and many agencies are already present in wikidata, I suggest to use wikidata entity IDs as identifiers, by which further information can be linked to agencies and unique IDs across GTFS feeds would automatically achieved.

This also would be a step forward to promote linked open data in the transit domain.

Downsides
Currently, the DELFI feed often uses kind of "Dummy agencies" (see #107) which would not exist in wikidata. Personally, I consider this bad practice and the recommendation to use wikidata entity IDs could underline that real agencies should be specified. As long as this is not the case, agency_ids not refering to an existing wikidata-entity should at least not use wikidata entity id format, i.e. they shoul not start with a Q followed by numbers.

Last update of GTFS Feed
2024-04-02

GTFS Feed Download Link
Open-Data ÖPNV

To start collecting the entity identifiers and match them with the current agency_id, I started this DELFI GTFS Agencies Google Sheet. Feel free to create missing agencies in wikidata.

Your suggestion is an interesting approach. It will be included in our internal discussion about adapting agency.txt.
Using the example of the associations in Baden-Württemberg, I would like to point out the following restriction: the company Friedrich Müller Omnibus operates on both the VVS and the HNV and each has its own internal ID for Friedrich Müller Omibus. It will be difficult to merge these two IDs in our data collector and reference them to the wikidata ID.

Best Regards
BeckertAnke (DELFI-Team)

Thanks for considering it in the further discussion. I guess this ID merge restriction is also the reason for current _G or _D suffixes in stops.txt oder routes.txt? If that's the case, I'd think that a general solution for merging equivalent entities provided by different agencies needs to found. If the collector itself can't merge them, a post-processing might be required(?)

You're right. The _G suffixes are added to the GTFS feed due to data merging. We also do not want _D suffixes. We follow this up with the data suppliers and ask them to provide us with correct data sets.

Merging agencies is hard work ;-)