Validation Result for Calendar Validators
dancesWithCycles opened this issue · 10 comments
Hi folks,
Thank you so much for providing and maintaining this repository. I am highly interested in any calendar related validators/ validation business rules. I called gtfsvtor this way.
~/gtfsvtor/build-tpx200-69-nov-16-2021/gtfsvtor/bin/gtfsvtor -o vtor-res-top-level-dhid-nov-16-2021.html --numThreads 4 -v -c config.properties --jsonAppend --jsonOutput 'vtor-res-top-level-dhid-nov-16-2021.json' pxypihdrpv
The config file config.properties
contains the following rules.
validator.CalendarValidator.checkEmptyCalendars=true
validator.CalendarValidator.checkExpired=true
validator.CalendarValidator.expiredCutoffDate=2020/10/30
validator.CalendarValidator.checkFuture=true
validator.CalendarValidator.futureCutoffDate=2022/10/29
validator.CalendarValidator.maxDaysWithoutService=5
validator.CalendarValidator.minDaysForCheckingNoServiceException=70
validator.CalendarDateStreamingValidator.minYearInThePast=2019
validator.CalendarDateStreamingValidator.validator.maxYearInTheFuture=2023
validator.CalendarStreamingValidator.minYearInThePast=2019
validator.CalendarStreamingValidator.validator.maxYearInTheFuture=2023
I configured either html as well as json output. Anyhow, I can not find any calendar related results in these two output files. Are those results reported a different way? Does that mean the validated feed is valid? Is my configuration wrong?
I appreciate any help that points me in the right direction.
Cheers!
In all probability if there is no error or warning reported in the output the GTFS is valid (regarding the validation rules that are implemented anyway). Calendar errors are output as the others. Do you expect any error to be reported back?
Hi @laurentg ,
I do not know if the GTFS feed is valid or not. I am studying the rules trying to understand them. I did not find a documentation of the rules except the output of the --help
option. So, I am learning by doing. I am planing the following next steps for my study.
- Tweak the config in a way I receive calendar errors in the output.
- Find GTFS feeds that contain calendar errors.
Do you know any example feeds that contain errors that you used while testing gtfsvtor?
Do you have any suggestions for me on how to study the calendar related rules?
If I am learning something about rules that is worth to be part of a documentation, how would you like to add it to this repository?
Cheers!
If you look into the src/main/resources/data
directory in the source code, you will find many GTFS with errors (used by the unit testing). For calendars, there are some:
- only_calendar_dates
- toomanydayswoservice
- missing_weekday_column
- missing_calendar
- bad_date_format
- bogus_calendars
- duplicate_schedule_id
- empty_calendar
- verybad
HTH
Hi @laurentg ,
That helps a lot. If I may ask another question, it will be this one.
Would you say I can use the existing rule set to figure out how long a GTFS feed is valid? How many days until the feed is expired?
That might be an important piece of information for a system that needs to update a GTFS zip file before it expires.
Or, would that be an additional new rule?
Cheers!
Did you check out, if the validator.CalendarValidator.expiredCutoffDate
configuration mentioned in the README.md already matches your requirements? You'll need to specify a specific date, not a relative delta period.
If you'd like to contribute this feature, class CalendarValidator would be the place to start.
I think @hbruch already answered, indeed you can use CalendarValidator and specify a cutoff date.
Having said that, the current configuration is a static date you have to specify, it could be an improvement to be able to specify a number of days after "now" (the date at which the validator is run).
Hi @hbruch ,
Hi @laurentg ,
Thank you very much for your thoughts and the insights into gtfsvtor.
According to CalendarValidator
the rules validator.CalendarValidator.expiredCutoffDate
and validator.CalendarValidator.futureCutoffDate
are using the feed_start_date
and feed_end_date
field cells from the feed_info.txt
file.
I am working with feeds that does not provide the feed_info.txt
file.
That is why I am looking for a rule that tells me if a feed is expired evaluating the calendar.txt
and calendar_dates.txt
files.
Would you recommend this approach or steer it another way?
Would that be a new rule?
Would you still add it to the CalendarValidator
?
Cheers!
@dancesWithCycles Do you confirm this issue can be closed?