simonpoole/OpeningHoursParser

Possibility to exclude certain strings from parsing

Opened this issue · 5 comments

Hey @simonpoole,
I tried to parse some of the following examples:

daily 05.00 am - 09.00 pm
and also
06.00 a.m. - 07.00 p.m..

Unfortunately, both didn't pass the non-strict mode of the parser, due to the fragments 'daily' and 'a.m.'. Is a future implementation planned?

In the meantime, would it be possible to provide an additional functionality to help us out? Perhaps one, in which the user is able to exclude certain strings like 'a.m.', 'daily' and others by defining them in advance?

In general there are two ways this could be done:

  • restarting parsing after it fails for unknown tokens (however this wouldn't result in a valid spec in many cases)
  • skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

As to "a.m." - "p.m." the same goes as above, there needs to a non-neligble amount of use for adding these to make sense, we already have a bad case of diminishing returns with a lot of the special cases we are handling in non-strict mode.

ypid commented

skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

Maybe that helps: https://github.com/opening-hours/opening_hours.js/blob/master/locales/word_error_correction.yaml

@ges1227 I've done some work on this by skipping such token in lexical analysis. This works well from a pure functional pov, unluckily it makes implementation of strict/non-strict modes of the parser rather messy and forces us to return a JAVA Error instead of an Exception if we detect such a token in strict mode, which will cause validators to moan endlessly (this is due to an architectural wart of javacc), So I'm not quite sure if the code should really be included. Need to think about it a bit.

@simonpoole, @ypid Thanks for your support, it really brought me further!
So I have been experimenting with the YAML file and achieved a rather satisfing solution. Basically my input strings for the OpeningHoursParser will be filtered by replacing foul words according to the definitions in the YAML file (little example attached).

Therefore no worries about a messy parser anymore, your tips helped me to manage the problem.
As a sidenote, the code is far from perfect.. maybe two YAML files (one for regex, one for 'normal' words) are more helpful to distinguish, whether a regex or 'normal' replacement should be applied onto the string.

a.m. and p.m. supported via 0acfaa6