/transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds

Primary LanguageJavaOtherNOASSERTION

transit-feed-quality-calculator Build Status

A project that uses the gtfs-realtime-validator to assess the quality of a large number of transit feeds.

This tool:

  1. Fetches the URLs for GTFS-realtime feeds and corresponding GTFS data from either the TransitFeeds.com GetFeeds API or a specified .csv file, and downloads them from each agency's server into a subdirectory
  2. Runs the gtfs-realtime-validator Batch Processor on each of the subdirectories
  3. Produces summary statistics and graphs, such as:

image

Read more in this Medium article.

Running the application

You'll need JDK 7 or higher.

This project was created in IntelliJ. You can also compile it from the command line using Maven.

If you're downloading GTFS or GTFS-rt from secure HTTPS URLs, you may need to install the Java Cryptography Extension (JCE). You will need to replace the US_export_policy.jar and local_policy.jar files in your JVM /security directory, such as C:\Program Files\Java\jdk1.8.0_73\jre\lib\security, with the JAR files in the JCE Extension download. Alternately, you can add -Djsse.enableSNIExtension=false to the command line when running the application.

To download feeds, you'll also need a TransitFeeds.com API key or a .csv file that includes feed information (see below).

Command line

  1. mvn package
  2. java -Djsse.enableSNIExtension=false -jar target/transit-feed-quality-calculator-1.0.0-SNAPSHOT.jar -directory output -transitFeedsApiKey 1234567689 -csv feeds.csv

Note that to download feeds, you'll need to provide an API key for TransitFeeds.com or a .csv file that includes feed information.

See the below command-line options section for a description.

IntelliJ

Run the Main.main() method, and provide the command-line options via the "Run configurations->Program arguments" feature.

Command line options

  • -directory "output" - Required - The directory to which feeds will be downloaded (in this case output), and to which validation and analysis files will be output
  • -transitFeedsApiKey YOUR_API_KEY - (Optional) - Your TransitFeeds.com API key (in this case, YOUR_API_KEY)
  • -csv "feeds.csv" - (Optional) - A CSV file holding feed information (in this case, feeds.csv - you can name it whatever you want)
  • -forceGtfsDownload false - (Optional) - If false, if there is already a GTFS file on disk for a feed it will not download a new GTFS file. If true or if the command-line option is omitted, then a new GTFS file will always be downloaded and overwrite any current GTFS file for each feed.
  • -errorsToIgnore "E017,E018" - (Optional) - A comma-delimited list of errors to ignore when calculating summary error results and generating the Excel file. By default errors that examine sequential feed iterations (E017, E018) are ignored (as archived files may not have been collected iteratively) (see TransitFeedQualityCalculator.java, but setting a value via the command-line parameter will overwrite the default value.
  • -warningsToIgnore "W007,W008" - (Optional) - A comma-delimited list of warnings to ignore when calculating summary warnings results and generating the Excel file. By default warnings that examine sequential feed iterations (W007, W008) are ignored (as archived files may not have been collected iteratively) (see TransitFeedQualityCalculator.java, but setting a value via the command-line parameter will overwrite the default value.

If you want to download feeds, either -transitFeedsApiKey or -csv parameters must be provided. If these are missing, this tool will proceed to validate and analyze the feeds currently in -directory without downloading any new files.

The feeds.csv file should be formatted as follows:

region_id,title,gtfs_url,gtfs_rt_url
"10000-Portland, OR, USA","TriMet Trip Update",https://developer.trimet.org/schedule/gtfs.zip,http://developer.trimet.org/ws/V1/TripUpdate&appID=225D5601E7729B9ED863DCA39
"10000-Portland, OR, USA","TriMet Alerts",https://developer.trimet.org/schedule/gtfs.zip,http://developer.trimet.org/ws/V1/FeedSpecAlerts&appID=225D5601E7729B9ED863DCA39
"20000-Oakland, CA, USA","AC Transit Trip Update",http://www.actransit.org/wp-content/uploads/GTFSWinter17B.zip,http://api.actransit.org/transit/gtfsrt/tripupdates?token=9A6257A021F944E7BE0AD32702DF23CE

Tips:

  • region_id should follow the format of 10000-Portland, OR, USA - a - should separate the ID from the region name. The region_id field will be the name of the subdirectory under -directory in which feed files will be saved. We recommend prefixing it with a large integer value following the region pattern of TransitFeeds.com, to avoid collisions with downloads from TransitFeeds.com.
  • If you have more than one GTFS-rt feed (e.g., VehiclePositions and TripUpdates), use the same region_id for each. This way the GTFS data will only get downloaded once for that feed, and both GTFS-rt feeds will be downloaded to the same directory.
  • The title field will be the file name of the downloaded protocol buffer file
  • gtfs_url and gtfs_url_url can contain API keys if needed (e.g., http://developer.trimet.org/ws/V1/TripUpdate&appID=1234567890)
  • Be sure to surrounding any fields that contains spaces with "

Sample output

You'll see a lot of folders within the output directory, one for each transit agency:

image

If you look in one of those folders, you'll see the following:

image

This contains the GTFS and GTFS-realtime source files downloaded from the agency:

  1. gtfs-zip - The GTFS data that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API
  2. HART Trip Updates-xxxx.pb - The TripUpdates binary Protocol Buffer file that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API, with the UTC time in milliseconds appended
  3. HART Vehicle Positions-xxxx.pb - The VehiclePositions binary Protocol Buffer file that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API, with the UTC time in milliseconds appended

...as well as plain text versions of the GTFS-realtime files generated by the gtfs-realtime-validator:

  1. HART Trip Updates-xxxx.pb.txt - The plain text version of the above TripUpdates binary
  2. HART Vehicle Positions-xxxx.pb.txt - The plain text version of the above VehiclePositions binary

...and the validation results for each GTFS-realtime file (see gtfs-realtime-validator Batch Processor output examples for details):

  1. HART Trip Updates-xxxx.results.json - The validation results for the above TripUpdates binary
  2. HART Vehicle Positions-xxxx.results.json - The validation results for the above VehiclePositions binary

An Excel spreadsheet file analysis-graphs.xlsx will be generated in the root folder of the project that contains graphs that summarize all of the analyzed GTFS-realtime feeds - for example:

image

The analysis results are also output to a JSON file, analysis-summary.json.

Implementation details

Take a look at the Main.main() method.

Here's a simplified version of what it looks like:

String directoryName = "your-directory";
String transitFeedsApiKey = "YOUR_TRANSIT_FEEDS.COM_API_HERE";
String csvFile = "feed-file.csv";

TransitFeedQualityCalculator calculator = new TransitFeedQualityCalculator(Paths.get(directoryName));
if (transitFeedsApiKey != null) {
    calculator.setTransitFeedsApiKey(transitFeedsApiKey);
}
if (csvFile != null) {
    calculator.setCsvDownloaderFile(csvFile);
}
calculator.calculate();

This demonstrates the usage of the TransitFeedQualityCalculator, which performs the following steps:

  1. Download - Via TransitFeedsDownloader and CsvDownloader
  2. Validate - Via BulkFeedValidator
  3. Analyze - Via ResultsAnalyzer
  4. Export - To Excel file via ExcelExporter to JSON file via Jackson

Dependencies

Managed via Maven: