fish-face/quasselgrep

Support date ranges with ISO dates

Closed this issue · 1 comments

For some reason the "to" breaks the parsing of ISO formatted dates

$ quasselgrep <options> -t '2016-12-27 to 2017-04-01' <query>
Searching from 2016-01-01 00:00:00 to 2016-12-31 23:59:59.999999.

Compare:

$ quasselgrep <options> -t '27th of December 2016 to 2017-04-01' <query>
Searching from 2016-12-27 00:00:00 to 2017-04-01 23:59:59.999999.

The reason is because the dateparse code from Whoosh is consuming the space between the day of the start date and the word "to" prematurely. Debug output:

  Seq None sep='' text='2012-02-03  to 2013-04-05'
  Seq None text='2012-02-03  to 2013-04-05'
  Seq None trying=Sequence<simple>[<'(?P<year>[0-9]{4})'>, <'(?P<month>[0-1][0-9])'>, <'(?P<day>[0-3][0-9])'>, <'(?P<hour>([0-1][0-9])|(2[0-3]))'>, <'(?P<minute>[0-5][0-9])'>, <'(?P<second>[0-5][0-9])'>, <'(?P<microsecond>[0-9]{6})'>] at=0
    Seq simple sep='[- .:/]*' text='2012-02-03  to 2013-04-05'
    Seq simple text='2012-02-03  to 2013-04-05'
    Seq simple trying=<'(?P<year>[0-9]{4})'> at=0
    Seq simple result=adatetime(2012, None, None, None, None, None, None)
    Seq simple adding=adatetime(2012, None, None, None, None, None, None) to=adatetime(None, None, None, None, None, None, None)
    Seq simple filled date=adatetime(2012, None, None, None, None, None, None)
    Seq simple text='-02-03  to 2013-04-05'
    Seq simple looking for sep
    Seq simple trying=<'(?P<month>[0-1][0-9])'> at=5
    Seq simple result=adatetime(None, 2, None, None, None, None, None)
    Seq simple adding=adatetime(None, 2, None, None, None, None, None) to=adatetime(2012, None, None, None, None, None, None)
    Seq simple filled date=adatetime(2012, 2, None, None, None, None, None)
    Seq simple text='-03  to 2013-04-05'
    Seq simple looking for sep
    Seq simple trying=<'(?P<day>[0-3][0-9])'> at=8
    Seq simple result=adatetime(None, None, 3, None, None, None, None)
    Seq simple adding=adatetime(None, None, 3, None, None, None, None) to=adatetime(2012, 2, None, None, None, None, None)
    Seq simple filled date=adatetime(2012, 2, 3, None, None, None, None)
    Seq simple text='  to 2013-04-05'
    Seq simple looking for sep
    Seq simple trying=<'(?P<hour>([0-1][0-9])|(2[0-3]))'> at=12
    Seq simple result=None
    Seq simple final=adatetime(2012, 2, 3, None, None, None, None)
  Seq None result=adatetime(2012, 2, 3, None, None, None, None)
  Seq None adding=adatetime(2012, 2, 3, None, None, None, None) to=adatetime(None, None, None, None, None, None, None)
  Seq None filled date=adatetime(2012, 2, 3, None, None, None, None)
  Seq None text='to 2013-04-05'
  Seq None trying=<'(?=(\\s|$))'> at=12
  Seq None result=None
  Seq None failed

Notice the line

  Seq None text='to 2013-04-05'

At this point the parser is looking for whitespace or the end of the string to finish matching an ISO date ("simple") but finds neither, because the "simple" parser consumes terminating space characters as they are valid separators for simple dates.

While it would be preferable to fix the underlying code, this suggests the kludge of just prefixing a tab or other non-space whitespace character to the word "to", as long as the whole date range has the form "simple date to ???" which we could tell with a very simple regex.