TIMEZONE setting unable to handle two-digit UTC offset
Opened this issue · 2 comments
These UTC offsets behave as expected:
import dateparser
dateparser.parse('tomorrow', settings={'TIMEZONE': 'UTC+8'})
dateparser.parse('tomorrow', settings={'TIMEZONE': 'UTC+08'})
dateparser.parse('tomorrow', settings={'TIMEZONE': 'UTC+0800'})
dateparser.parse('tomorrow', settings={'TIMEZONE': 'UTC+8:00'})
dateparser.parse('tomorrow', settings={'TIMEZONE': 'UTC+08:00'})
dateparser.parse('tomorrow', settings={'TIMEZONE': '+0800'})
dateparser.parse('tomorrow', settings={'TIMEZONE': '+08:00'})
However, if the UTC offset timezone omits both "UTC" and the minutes offset, there will be an error. Example:
import dateparser
dateparser.parse('tomorrow', settings={'TIMEZONE': '+08'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/dateparser/conf.py", line 92, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/__init__.py", line 61, in parse
data = parser.get_date_data(date_string, date_formats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/date.py", line 451, in get_date_data
parsed_date = _DateLocaleParser.parse(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/date.py", line 200, in parse
return instance._parse()
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/date.py", line 204, in _parse
date_data = self._parsers[parser_name]()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/date.py", line 224, in _try_freshness_parser
return freshness_date_parser.get_date_data(self._get_translated_date(), self._settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/freshness_date_parser.py", line 156, in get_date_data
date, period = self.parse(date_string, settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/freshness_date_parser.py", line 91, in parse
now = apply_timezone(utc_dt, settings.TIMEZONE)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/utils/__init__.py", line 119, in apply_timezone
new_datetime = apply_tzdatabase_timezone(date_time, tz_string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dateparser/utils/__init__.py", line 94, in apply_tzdatabase_timezone
usr_timezone = timezone(pytz_string)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 201, in timezone
raise UnknownTimeZoneError(zone)
pytz.exceptions.UnknownTimeZoneError: '+08'
Dateparser should support two-digit UTC offsets because Python standard libraries sometimes return such offsets. For example:
$ TZ=:Asia/Singapore python3
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> datetime.datetime.now().astimezone().tzname()
'+08'
Please fix dateparser so that the TIMEZONE
setting is able to handle two-digit UTC offsets such as '+08'
.
I suggested @cwfoo open this bug report, but after investigating this issue a little further, it does seem like an odd timezone parameter...
I have a patch for undertime in here that tries to workaround that issue:
https://gitlab.com/anarcat/undertime/-/merge_requests/22
I'm not sure what the right way to go here. The best would be for dateparser to accept actual tzinfo objects instead of having to pass them as a string in the environment.
but after investigating this issue a little further, it does seem like an odd timezone parameter...
Indeed.
The best would be for dateparser to accept actual tzinfo objects instead of having to pass them as a string in the environment.
Sounds like a valid enhancement.
Maybe you could edit the title and description of the issue to be about this enhancement. Or close this issue and open a new one about the enhancement.