ipeaGIT/gtfstools

Create filter by day of the week

rafapereirabr opened this issue · 4 comments

Here we have a filter_week_days() function in the gtfs2gps package we could use as base.

Probably gonna do something like filter_by_weekday(gtfs, weekday, keep), where weekday can be any of c("mon", "tue", "wed", "thu", "fri", "sat", "sun").

Looking at the function already implemented in the gtfs2gps package, the new function should also be concerned with GTFS with only the calendar_dates file (without calendar).

filter_by_weekday() has been introduced in 2cab7b0. It includes a combine argument to control whether you want to use OR or AND when filtering by multiple days of the week. From the function examples:

# read gtfs
data_path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
gtfs <- read_gtfs(data_path)

object.size(gtfs)
#> 811304 bytes

# keeps entries related to services than run EITHER on monday OR on sunday
smaller_gtfs <- filter_by_weekday(gtfs, weekday = c("monday", "sunday"))
smaller_gtfs$calendar[, c("service_id", "monday", "sunday")]
#>     service_id monday sunday
#>  1:        USD      1      1
#>  2:        U__      1      0
#>  3:        US_      1      0
#>  4:        _SD      0      1
#>  5:        __D      0      1
#>  6:        USD      1      1
#>  7:        U__      1      0
#>  8:        US_      1      0
#>  9:        _SD      0      1
#> 10:        __D      0      1
object.size(smaller_gtfs)
#> 811248 bytes

# keeps entries related to services than run on monday AND on sunday
smaller_gtfs <- filter_by_weekday(
  gtfs,
  weekday = c("monday", "sunday"),
  combine = "and"
)
smaller_gtfs$calendar[, c("service_id", "monday", "sunday")]
#>    service_id monday sunday
#> 1:        USD      1      1
#> 2:        USD      1      1
object.size(smaller_gtfs)
#> 762152 bytes

# drops entries related to services than run EITHER on monday OR on sunday
# the resulting gtfs shouldn't include any trips running on these days
smaller_gtfs <- filter_by_weekday(
  gtfs,
  weekday = c("monday", "sunday"),
  keep = FALSE
)
smaller_gtfs$calendar[, c("service_id", "monday", "sunday")]
#>    service_id monday sunday
#> 1:        _S_      0      0
#> 2:        _S_      0      0
object.size(smaller_gtfs)
#> 19912 bytes

# drops entries related to services than run on monday AND on sunday
# the resulting gtfs may include trips that run on these days, but no trips
# that run on both these days
smaller_gtfs <- filter_by_weekday(
  gtfs,
  weekday = c("monday", "sunday"),
  combine = "and",
  keep = FALSE
)
smaller_gtfs$calendar[, c("service_id", "monday", "sunday")]
#>     service_id monday sunday
#>  1:        U__      1      0
#>  2:        US_      1      0
#>  3:        _SD      0      1
#>  4:        __D      0      1
#>  5:        _S_      0      0
#>  6:        U__      1      0
#>  7:        US_      1      0
#>  8:        _SD      0      1
#>  9:        __D      0      1
#> 10:        _S_      0      0
object.size(smaller_gtfs)
#> 69880 bytes

For now it only uses the calendar table to filter, as using the calendar_tables adds a lot of complexity to the function. I'm closing this issue for now, but if this becomes a problem in the future we can tackle it later.

Brilliant !