cal-itp/gtfs-aggregator-checker

Feature request: have the ability to return URLs not in input data within a certain region

Opened this issue · 2 comments

User Story (Cal-ITP)

As a research data analyst,
I want to know if there are more up-to-date GTFS URLs found on feed aggregator websites than the GTFS URLs that Cal-ITP has
so that I can maintain a database of the GTFS URLs of the CA transit agencies
and so that I can have additional sources of information indicating which GTFS URLs transit agencies have

User Story (Community User)

As a transit application developer,
I want to get a list of all GTFS URLs on all feed aggregator websites for a particular region
so that I can have a complete list of all GTFS URLs to download data from to power my transit application

Acceptance Criteria

Given

  1. The input GTFS URLs given to any of the command-line input options of this program
  2. The input aggregator regions to check in

For transitland, it seems like the agencies can be queried to determine where they operate and compared with the feeds found based off of the input URLs. The command line arguments could look something like this:

--transit-land-adm1_iso=US-CA

For transitfeeds, the hardcoded location could be made configurable via a command line argument:

--transit-feeds-location=67-california-usa

  1. The GTFS URLs found on the aggregator websites for their respective regions

Then The URLs found on the aggregator websites that weren't within the input list URLs should be outputted in a separate section of the output.

Example:

When searching for all transitfeeds URLs in Saskatchewan, Canada, but also checking against a single input URL, the CLI input and result could be as follow:

CLI Input

python -m gtfs_aggregator_checker --url https://opengis.regina.ca/reginagtfs/google_transit.zip --output results.json --transit-feeds-location=196-saskatchewan-canada

JSON Output

{
  "input_url_results": {
    "https://opengis.regina.ca/reginagtfs/google_transit.zip": {
      "transitfeeds": {
        "public_web_url": "https://transitfeeds.com/p/the-city-of-regina/830",
        "status": "present"
      },
      "transitland": {
        "public_web_url": "https://www.transit.land/feeds/f-c8vx-thecityofregina",
        "status": "present"
      }
    }
  },
  "additional_aggregator_urls_in_region_not_in_input_list": [
    {
      "transitfeeds_metadata": {
        "name": "Saskatoon Transit GTFS",
        "public_web_url": "https://transitfeeds.com/p/city-of-saskatoon/264",
        "type": "GTFS Schedule"
      },
      "url": "http://apps2.saskatoon.ca/app/data/google_transit.zip"
    },
    {
      "transitfeeds_metadata": {
        "name": "Saskatoon Transit Service Alerts",
        "public_web_url": "https://transitfeeds.com/p/city-of-saskatoon/842",
        "type": "GTFS Realtime Service Alerts"
      },
      "url": "http://apps2.saskatoon.ca/app/data/Alert/Alerts.pb"
    },
    {
      "transitfeeds_metadata": {
        "name": "Saskatoon Transit Trip Updates",
        "public_web_url": "https://transitfeeds.com/p/city-of-saskatoon/841",
        "type": "GTFS Realtime Trip Updates"
      },
      "url": "http://apps2.saskatoon.ca/app/data/TripUpdate/TripUpdates.pb"
    },
    {
      "transitfeeds_metadata": {
        "name": "Saskatoon Transit Vehicle Positions",
        "public_web_url": "https://transitfeeds.com/p/city-of-saskatoon/840",
        "type": "GTFS Realtime Vehicle Positions"
      },
      "url": "http://apps2.saskatoon.ca/app/data/Vehicle/VehiclePositions.pb"
    }
  ]
}

Based on Evan's input, this isn't a pressing priority and we can defer work on this. Removing it from our current sprint and setting it to Sprint: 5/16 - 5/27 for tracking purposes.

Please icebox this indefinitely.