zip::zip_list can return absolute file paths
Closed this issue · 2 comments
The zip::zip_list()
does not necessarily return file names without paths - see ?zip::zip
for details. The following code demonstrates (in a possibly OS-dependent manner, so don't worry if you can't reproduce):
f <- file.path (tempdir (), "mtcars.csv")
write.table (mtcars, f, sep = ",")
z <- file.path (tempdir (), "my.zip")
utils::zip (z, files = f, flags = "-q")
zip::zip_list (z)
#> filename compressed_size uncompressed_size
#> 1 tmp/Rtmp8OkYhX/mtcars.csv 861 1780
#> timestamp permissions crc32 offset
#> 1 2021-06-21 08:40:22 666 16e2dfc6 0
Created on 2021-06-21 by the reprex package (v2.0.0.9000)
In such cases, the names of the items read in import_gtfs
will be the full paths, and the returned object retains those names. Everything works, but any other packages expecting a gtfs feed to have standard names will then fail. This is what happens:
library (gtfsrouter)
library (gtfsio)
f <- berlin_gtfs_to_zip ()
g <- gtfsio::import_gtfs (f)
names (g)
#> [1] "tmp/Rtmp2EVj5z/calendar" "tmp/Rtmp2EVj5z/routes"
#> [3] "tmp/Rtmp2EVj5z/trips" "tmp/Rtmp2EVj5z/stop_times"
#> [5] "tmp/Rtmp2EVj5z/stops" "tmp/Rtmp2EVj5z/transfers"
Created on 2021-06-21 by the reprex package (v2.0.0.9000)
I'll submit a PR to fix the names once #16 has been merged. The package will then need a final check_gtfs_format()
or gtfsio_is_valid()
or similar function that confirms that everything has the expected format - that all required tables are present, and that all required columns of required tables are also present. We can worry about that once #16 and this issue have been addressed. 👍
Great catch, as always, @mpadge. I can fully reproduce this behaviour in my computer as well.
Regarding the check_gtfs_format()
function you propose, I suggest using assert_gtfs()
, which I created as a validator for the gtfs
class (the name is not great, I know, but both {gtfstools}
and {tidytransit}
had their own validate_gtfs()
already, so I opted for the current one).
Currently it checks if all elements inside the GTFS are named and if they inherit from data.frame, but it doesn't check the content of their names. What do you think?
The names are official, and so checking them should be part of assert_gtfs()
. I would suggest that it should assert that all required names exist, and that any other names match one of the optional names and nothing else. I'll leave you to do that, and i'll PR with a fix for this. Thanks!!