ipeaGIT/gtfstools

New function stop_times_to_frequencies()

rafapereirabr opened this issue · 4 comments

A function that creates a frequency.txt file based on stop_times.txt. This would help address a common issue of r5r users, see this issue.

Sorry. But how do I create this function "that creates a frequency.txt file based on stop_times.txt"? Is there any R code link to this function(if it has already been already created)?
The hyperlink on "issue" takes me back to my question page.

In GTFS tools page, i could not find it.

I followed ipeaGIT/r5r#321 here as I was also interested in the time_window() argument in r5r::travel_time_matrix. I found a get_route_frequency() function in the tidytransit package that looks useful.

The logic could be used to get the headway of all routes at different time periods and combine that together to create a frequencies.txt file

I might try to create a stop_times_to_frequency function and will comment here if I do. In any case, I thought it could be useful to mention the tidytransit function here

I have a draft stop_times_to_frequencies() function below. Some notes on the logic:

  • I assign each trip from stop_times to a time interval. These time intervals are a custom input so you can have the headway of a trip at specified time intervals during the day
  • The trip_id in a gtfs refers to a unique vehicle departure, so trips that go on the same itinerary from A to B (ie buses going on bus route x with a specific trip_headsign/direction) have different trip_ids.
  • To get headway_secs, we need to identify and group these trips somehow. For each trip, I create a column that has the stop_ids in the order that they are visited by the trip (below I call this column stop_id_order.
  • I also join the service_id to the stop_times. Different service_ids reflect the same trip at different days, so a trip will be repeated multiple times in stop_times.txt. If we don't group by service_id, then we will add together trips that are on different days, which would overinflate our headway_secs.
  • I then group by stop_id_order, service_id, start_time, end_time and get the number of departures, which is used to get the headway.
  • I add the frequencies file to the feed and filter the feed so that it only keeps the trip_ids that are in the frequencies file
  • I don't know if any edits should be made to the stop_times.txt (or if it should be removed), but I keep it as is
stop_times_to_frequencies <- function(gtfs,
                                      time_ranges = tibble(start_time = c("00:00:00", "09:00:00", "12:00:00", "19:00:00"),
                                                           end_time =   c("09:00:00", "12:00:00", "19:00:00", "23:59:00"))){
  # PURPOSE: convert a stop_times based feed to a frequency based feed
  # INPUT:
  # gtfs: the gtfs feed you want to edit
  # time_ranges: the day is split into multiple time slots. We calculate the frequency of trip in each of these slots
  #              format: a tibble with columns "start_time" and "end_time"
  # OUTPUT
  # a frequency based gtfs feed

  message(" ... converting time ranges to hms ... ")
  # ----- convert time ranges to hms
  time_ranges <- time_ranges %>%
    mutate(across(everything(), hms::as_hms))


  # --- Calculate the headway

  # 1. identify trips with the same itinerary. Trip IDs are unique for every departting vehicle,
  #    so we group vehicles that have the same stop sequence

  # Create a column to identify same trips (trips with the same stop sequence)
  message("... identifying trips with same stop sequence ...")

  trips_stop_sequence <- gtfs$stop_times %>% group_by(trip_id) %>%
    mutate(stop_id_order = paste0(stop_id, collapse = '-')) %>%
    ungroup()

  # keep only one row per unique trip
  trips_stop_sequence <- trips_stop_sequence %>%
    # we use stop_sequence == min(stop_sequence) instead of == 0, as stop_sequence doesn't have to start from 0
    filter(stop_sequence == min(stop_sequence)) %>%
    # some arrival times are bigger than 24 - these cause errors when converting to time
    filter(as.character(arrival_time) <= "23:59:59") %>%
    mutate(arrival_time = hms::as_hms(arrival_time))

  # 2. Assign a time range to each trip based on the departure from the first stop
  message("... assigning trips to time ranges ... ")

  trips_time_ranges <- trips_stop_sequence %>%
    inner_join(time_ranges,
               join_by(arrival_time >= start_time, arrival_time < end_time))

  # 3. Get the headway of each trip

  # add the service_id to each trip
  trips_time_ranges <- trips_time_ranges %>%
    left_join(gtfs$trips %>% select(trip_id, service_id), by = "trip_id")

  # calculate number of buses for each unique trip + time range + service_id combination
  message("... calculating headways ... ")

  trips_headways <- trips_time_ranges %>%
    group_by(stop_id_order, service_id, start_time, end_time) %>%
    summarise(vehicles = n(),
              # we don't need all of the trip IDs (all trips in the same group have the same itinerary)
              trip_id = first(trip_id)) %>%
    ungroup()

  # get the headway (time between buses = time period / no. of buses)
  trips_headways <- trips_headways %>%
    mutate(headway_secs = round(as.numeric(end_time - start_time) / vehicles)) %>%
    # keep necessary columns only
    select(trip_id, start_time, end_time, headway_secs)


  # 4. edit the gtfs feed to produce a frequency-based feed
  message("... replacing stop_times with frequencies ... ")

  # # remove stop_times file
  # gtfs <- gtfs[names(gtfs) != "stop_times"]

  # add "frequencies" file to the gtfs feed
  gtfs$frequencies <- trips_headways

  # filter feed to only keep the trip ids in the new "frequencies" file
  gtfs <- gtfs %>%
    #gtfstools::filter_by_trip_id(trip_id =  .$frequencies$trip_id)
    tidytransit::filter_feed_by_trips(trip_ids = .$frequencies$trip_id)

  return(gtfs)

}

The function works with gtfs feeds read in using both gtfstools and tidytransit. however, with gtfstools, the filter_feed_by_trips() function raised an error

Error in vapply(x[[file]], class, character(1)) :
values must be length 1,
but FUN(X[[2]]) result is length 2

I haven't tried to debug this yet as the filtering function in tidytransit was working