akoumjian/datefinder

REPLACEMENTS not comprehensive enough?

julianss opened this issue · 0 comments

There are a lot of words that are recognized by the regex such as the "positionnal tokens", "extra tokens" and so on. But then when they have to be parsed by dateutils it fails. Take for example this date which isn't recognized when preceded by "last" but is recognized when preceded by "by".

In [126]: [x for x in datefinder.find_dates("last Mar-31-2023", source=True)]
Out[126]: []

In [127]: [x for x in datefinder.find_dates("by Mar-31-2023", source=True)]
Out[127]: [(datetime.datetime(2023, 3, 31, 0, 0), 'by Mar-31-2023')]

There is a REPLACEMENTS dict that strips problematic words. Shouldn't this dict be made more encompassing as to strip all the possible words that are recognized by the regex, or I am I missing something?