/nrstations

An R package for downloading station data from the National Rail Data Portal API

Primary LanguageRMIT LicenseMIT

nrstations

nrstations is a small R package for downloading station data across Great Britain from the National Rail Data Portal API.

Overview

Open data on the rail industry in Great Britain is available through a series of API feeds from the National Rail Data Portal. Documentation on all feeds available through the National Rail Data Portal (NRDP) can be found here. This package provides a simple way to extract data from the NRDP's KnowledgeBase Stations XML feed.

Usage

The package provides one primary function, fetch_stations_list, to fetch and parse all data from the Stations XML feed into an R object with class 'list'. Secondary functions prefixed get_* return tibbles for predetermined subsets of the list object generated by fetch_stations_list.

Each get_* function returns a tibble based on the major tags of the data, as shown in the XML schema, where each row is one station. Each tibble returned by get_* provides basic details on the name and location of each station. Where mandatory data may be recorded within optional tags, for example 'Open' and 'Available', data from the 'Available' tag has been extracted. The get_* functions available are:

get_station_tags
get_fare_tags
get_facility_tags
get_accessibility_tags
get_interchange_tags
get_all_station_tags

The final get_all_station_tags returns a tibble of combined of data from the preceeding five get_* functions.

Additional data can be extracted from the list returned by fetch_stations_list by using ordinary R syntax for extracting data from lists.

It is recommended to call fetch_stations_list first and store the returned list locally as a variable which can then be passed to the get_* functions. Fetching and parsing the Stations XML feed into R as a list takes around one minute.

The function fetch_stations_xml fetches and parses all data from the Stations XML feed as an R object with class 'xml_document'. The get_* functions will not work with 'xml_document' objects. To extract data from an 'xml_document' in R the xml2 package (available here) is recommended.

Example

The following provides an example of how to fetch and parse the Stations XML feed into an R list, and then create a tibble with the details on the facilities available at each station. When calling fetch_* pass your NRDP username and password to the relevant arguments in quotation marks.

stations <- nrstations::fetch_stations_list("nrdp_user@example.com", "nrdp_password_example")
stations_tibble <- nrstations::get_facility_tags(stations)

Prerequisites

In order to access the NRDP API feeds you need to create a NRDP account. Ensure that all subscription types are selected. The email address and password chosen for your NRDP account will be used as arguments in the fetch_* functions when fetching data from the Stations XML feed API.

Error messages

The two error messages you are most likely to see when using this package are:

  • HTTP Status 403 - Access is denied
  • HTTP Status 401 - Unauthorized: Authentication token was either missing or invalid

If you receive a HTTP Status 403 error you should check that "On Demand Data Feeds" is a selected subscription type in your NRDP account details. If you recieve a HTTP Status 401 code check that you entered the correct NRDP username and password for your account when calling the fetch_* functions.

Installation

Install from GitHub using devtools.

install.packages("devtools")
devtools::install_github("dempseynoel/nrstations")