grote/osm2gtfs

Source for time data

pantierra opened this issue · 12 comments

Currently, this repository includes different implementations for cities. Particularly the trip_creators are very much exclusively targeted and programmed for a special data source. However for any new person coming to this script it is almost impossible to find out where to get those data sources, in order to run osm2gtfs with the existing implementations, for example for the sake of testing. As they are so heavily connected, the data (source) on the schedule should be provided, too.

Instead of adding the actual data, I'd suggest to include a source field to the configuration file where a valid url to the file containing the time information should be specified. The script would then download the data, save it into the data directory and use it. Probably also an argument (similar to the --refresh- ones should be implemented, e.g. --refresh-time-table).

My repo https://github.com/Skippern/GV-scraper stores the time tables in times.json, and can be refreshed with the commands get_duration.py and get_times.py, that basically will gather the data and build a new times.json, if --refresh-time-table checks for the presence of get_duration.py and get_times.py and runs these commands, than I could do a full update in just one command line

As far as I can see we already have documentation for this one here. But, does it already work, @AltNico?

I think it would be good to have a timetable attribute in the configuration file, which specifies the URL to download. And then some minor caching, as described above.

The great work of @AltNico is in this branch: https://github.com/mapanica/osm2gtfs/tree/timetable
I guess it is almost ready to be filed as a pull request. I'm just wondering about one little thing I wanted to mention here and get your comments on it:

Currently the code checks in the main class, if the standard creators are used - before the script itself knows about it -, then it enforces the timeline to be parsed, valid and loaded. What about checking only if the file exists and having then the timetable (no matter which format) getting tunneled to the trips creator, which is the one in charge to check parsing errors, etc. I'm happy to provide a patch, but didn't want to do it before asking.

grote commented

Yes, this looks a bit strange and should be improved. If this is called data_source or something similar, then it could be passed into the trips creators and can be None by default. So the individual creator can decide what to do with that information.

data can be everything. The important point is time here. What about naming it time_source?

grote commented

Maybe schedule_source is clearer?

It is and I like it most. However, potentially it can be mixed up with the transitfeed's schedule object.

grote commented

I thought of that as well and agree. However, I think it fits better for our use-case than the transifeed object. Maybe we should rename the latter (in our code) to feed or something like that.

Based on the input here, I filed a pull request #91 proposed for approval.

This has been merged in #91. Thanks everybody!