r-transit/gtfsio

zip::zip_list can return absolute file paths

Closed this issue · 2 comments

The zip::zip_list() does not necessarily return file names without paths - see ?zip::zip for details. The following code demonstrates (in a possibly OS-dependent manner, so don't worry if you can't reproduce):

f <- file.path (tempdir (), "mtcars.csv")
write.table (mtcars, f, sep = ",")
z <- file.path (tempdir (), "my.zip")
utils::zip (z, files = f, flags = "-q")
zip::zip_list (z)
#>                    filename compressed_size uncompressed_size
#> 1 tmp/Rtmp8OkYhX/mtcars.csv             861              1780
#>             timestamp permissions    crc32 offset
#> 1 2021-06-21 08:40:22         666 16e2dfc6      0

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

In such cases, the names of the items read in import_gtfs will be the full paths, and the returned object retains those names. Everything works, but any other packages expecting a gtfs feed to have standard names will then fail. This is what happens:

library (gtfsrouter)
library (gtfsio)
f <- berlin_gtfs_to_zip ()
g <- gtfsio::import_gtfs (f)
names (g)
#> [1] "tmp/Rtmp2EVj5z/calendar"   "tmp/Rtmp2EVj5z/routes"    
#> [3] "tmp/Rtmp2EVj5z/trips"      "tmp/Rtmp2EVj5z/stop_times"
#> [5] "tmp/Rtmp2EVj5z/stops"      "tmp/Rtmp2EVj5z/transfers"

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

I'll submit a PR to fix the names once #16 has been merged. The package will then need a final check_gtfs_format() or gtfsio_is_valid() or similar function that confirms that everything has the expected format - that all required tables are present, and that all required columns of required tables are also present. We can worry about that once #16 and this issue have been addressed. 👍

Great catch, as always, @mpadge. I can fully reproduce this behaviour in my computer as well.

Regarding the check_gtfs_format() function you propose, I suggest using assert_gtfs(), which I created as a validator for the gtfs class (the name is not great, I know, but both {gtfstools} and {tidytransit} had their own validate_gtfs() already, so I opted for the current one).
Currently it checks if all elements inside the GTFS are named and if they inherit from data.frame, but it doesn't check the content of their names. What do you think?

The names are official, and so checking them should be part of assert_gtfs(). I would suggest that it should assert that all required names exist, and that any other names match one of the optional names and nothing else. I'll leave you to do that, and i'll PR with a fix for this. Thanks!!