mecatran/gtfsvtor

Validation Result for Calendar Validators

dancesWithCycles opened this issue · 10 comments

Hi folks,
Thank you so much for providing and maintaining this repository. I am highly interested in any calendar related validators/ validation business rules. I called gtfsvtor this way.

~/gtfsvtor/build-tpx200-69-nov-16-2021/gtfsvtor/bin/gtfsvtor -o vtor-res-top-level-dhid-nov-16-2021.html --numThreads 4  -v -c config.properties --jsonAppend --jsonOutput 'vtor-res-top-level-dhid-nov-16-2021.json' pxypihdrpv

The config file config.properties contains the following rules.

validator.CalendarValidator.checkEmptyCalendars=true
validator.CalendarValidator.checkExpired=true
validator.CalendarValidator.expiredCutoffDate=2020/10/30
validator.CalendarValidator.checkFuture=true
validator.CalendarValidator.futureCutoffDate=2022/10/29
validator.CalendarValidator.maxDaysWithoutService=5
validator.CalendarValidator.minDaysForCheckingNoServiceException=70
validator.CalendarDateStreamingValidator.minYearInThePast=2019
validator.CalendarDateStreamingValidator.validator.maxYearInTheFuture=2023
validator.CalendarStreamingValidator.minYearInThePast=2019
validator.CalendarStreamingValidator.validator.maxYearInTheFuture=2023

I configured either html as well as json output. Anyhow, I can not find any calendar related results in these two output files. Are those results reported a different way? Does that mean the validated feed is valid? Is my configuration wrong?

I appreciate any help that points me in the right direction.

Cheers!

In all probability if there is no error or warning reported in the output the GTFS is valid (regarding the validation rules that are implemented anyway). Calendar errors are output as the others. Do you expect any error to be reported back?

Hi @laurentg ,
I do not know if the GTFS feed is valid or not. I am studying the rules trying to understand them. I did not find a documentation of the rules except the output of the --help option. So, I am learning by doing. I am planing the following next steps for my study.

  • Tweak the config in a way I receive calendar errors in the output.
  • Find GTFS feeds that contain calendar errors.

Do you know any example feeds that contain errors that you used while testing gtfsvtor?

Do you have any suggestions for me on how to study the calendar related rules?

If I am learning something about rules that is worth to be part of a documentation, how would you like to add it to this repository?

Cheers!

If you look into the src/main/resources/data directory in the source code, you will find many GTFS with errors (used by the unit testing). For calendars, there are some:

  • only_calendar_dates
  • toomanydayswoservice
  • missing_weekday_column
  • missing_calendar
  • bad_date_format
  • bogus_calendars
  • duplicate_schedule_id
  • empty_calendar
  • verybad

HTH

Hi @laurentg ,
That helps a lot. If I may ask another question, it will be this one.

Would you say I can use the existing rule set to figure out how long a GTFS feed is valid? How many days until the feed is expired?

That might be an important piece of information for a system that needs to update a GTFS zip file before it expires.

Or, would that be an additional new rule?

Cheers!

Did you check out, if the validator.CalendarValidator.expiredCutoffDate configuration mentioned in the README.md already matches your requirements? You'll need to specify a specific date, not a relative delta period.

If you'd like to contribute this feature, class CalendarValidator would be the place to start.

I think @hbruch already answered, indeed you can use CalendarValidator and specify a cutoff date.

Having said that, the current configuration is a static date you have to specify, it could be an improvement to be able to specify a number of days after "now" (the date at which the validator is run).

Hi @hbruch ,
Hi @laurentg ,
Thank you very much for your thoughts and the insights into gtfsvtor.

According to CalendarValidator the rules validator.CalendarValidator.expiredCutoffDate and validator.CalendarValidator.futureCutoffDate are using the feed_start_date and feed_end_date field cells from the feed_info.txt file.

I am working with feeds that does not provide the feed_info.txt file.

That is why I am looking for a rule that tells me if a feed is expired evaluating the calendar.txt and calendar_dates.txt files.

Would you recommend this approach or steer it another way?

Would that be a new rule?

Would you still add it to the CalendarValidator?

Cheers!

It's already the case, the calendar validator will take the last calendar date defined if no feed_info is present (code here). If a feed_info is present, it will use the date defined in there if any (code here).

@dancesWithCycles Do you confirm this issue can be closed?

It's already the case, the calendar validator will take the last calendar date defined if no feed_info is present (code here). If a feed_info is present, it will use the date defined in there if any (code here).

Hi @laurentg ,
Thank you so much for making it crystal clear. Cheers!