TransitTalk/Transit-Talk

Fix Duplicate Stops

Opened this issue · 3 comments

Using Chicago Transit Authority data on the yellow line, for example, duplicate stops appear due to them being marked with different GPS coordinates. We need to figure out a better algorithm to create stop links and mark duplicates.

Although I know you mentioned a movement to TransitLand, I thought we could still try to identify what was going on in the GTFS file provided here. Taking the case of Oakton-Skokie (on the Yellow line), we see the following three entries (with the attributes provided for reference):

stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,wheelchair_boarding
30297,,"Oakton-Skokie","",42.02624348,-87.74722084,0,41680,1
30298,,"Oakton-Skokie","",42.02624348,-87.74722084,0,41680,1
41680,,"Oakton-Skokie","",42.02624348,-87.74722084,1,,1

Stop_ID is different, but that's expected. Everything otherwise is the same, except that there is no parent_station listed on that last entry, so it seems to treat it as something completely new. The location type is also quite different, and it looks like double listings could become an issue because of it as well. Per Google's Transit API documentation:

0 or blank: Stop. A location where passengers board or disembark from a transit vehicle.
1: Station. A physical structure or area that contains one or more stop.

So, Oakton-Skokie is an example of both a stop and a station, since there is a physical station at Oakton-Skokie. (Side/historical note: This was part of the Skokie Swift, so the way that location merging into the CTA system was documented may have held some inconsistencies compared to originally designed CTA lines. Highly doubtful, but it might explain why some stops/stations are treated different than others, especially since this is a fairly old system (original opened in the 1920s, closed in 1948).)

@rjaltman, can you delve into Transitland data and see if the issues brought up in your most recent comment are dealt with using their format?

This was mostly dealt with by the move to Transitland, but there's still some duplicates on bus lines. That'll require some further investigation.