Date parse issue
archerne opened this issue · 1 comments
Any idea on why 2020 FEB 10 PM 4:52
parses as 2020-02-09 04:52:00
?
Your 9
is coming from the date that you ran the code. It didn't recognize the 10
as a date and so is using dateutil's base_date.
The position of the PM
is throwing it off. It is highly unusual to place PM both before a hours/minutes and right after a date. If you run it without the time at the end:
In [15]: text = "2020 FEB 10 PM"
In [16]: print(next(datefinder.find_dates(text, source=True)))
(datetime.datetime(2020, 2, 29, 22, 0), '2020 FEB 10 PM')
You can see that it saw 10 PM
as it is making it's way through the text and rightfully cast it as a time. It didn't see anything that looked like a day of the month, so it defaulted to today.
Then when the regex finds 4:52
, it says "wait, this is a time!" and it uses that time instead because it is more verbose or because it finds it later and overwrites. So your 10 is getting dropped altogether.