metomi/isodatetime

Custom/"non-standard" use of T character causes it to disappear in dump

hjoliver opened this issue · 16 comments

Noticed by a cylc user:

% cylc cycle-point --template=reduced.CCYYMMDD00.t+3 20150808T00 reduced.CCYYMMDD00.t+3
reduced.2015080800.t+3

% cylc cycle-point --template=reduced.CCYYMMDD00.T+3 20150808T00 reduced.CCYYMMDD00.T+3
reduced.2015080800.+3

(note the disappearing capital-T).

Or is cylc cycle-point abusing TimePointDumper by passing in general string templates like this? The class doc string says Anything not matched will get left as it is in the string and T is special but is left alone, date/time separator.

In general, the CCYY notation is only intended/guaranteed for ISO 8601(-like) date-times. This isn't great behaviour though, so I'll have a look.

I think the strftime notation is better for this kind of template.

Steps to reproduce:

>>> from metomi.isodatetime.data import TimePoint
>>> point = TimePoint(year=2000)
>>> from metomi.isodatetime.dumpers import TimePointDumper
>>> dumper = TimePointDumper()

>>> dumper.dump(point, 'CCYYMMDD00.T+3')
'2000010100.+3'

I have also noticed that 'T' must be immediately followed by 'hh' to print dump the time.

>>> dumper.dump(point, 'Thh:mm')
'T00:00'
>>> dumper.dump(point, 'hh:mm')
'hh:mm'
>>> dumper.dump(point, 'T hh:mm')
'T hh:mm'

Is this a problem?

I've had a look at the function responsible for this behaviour

def _get_expression_and_properties(self, formatting_string):

Given T+03

  • It detects the time part of the string based on what's after the first 'T' in the string (+03)
  • It detects the time zone part based on what's after '+' in the time part, and strips that out of the time part, so the time part becomes empty
  • If the time part is not empty, it appends 'T' plus the time part to the output; so the 'T' here doesn't get outputted

is cylc cycle-point abusing TimePointDumper by passing in general string templates like this?

That would be my impression.

I think (echoing Ben's comment above) that this syntax works for ISO8601-esque formats (e.g. Thh:mm) but it should work more generally.

Only the letters PCYMDWhmsw (and also +-0123456789 in the suffix?) should be special, you should be free to use other letters as you please. So the letter T should behave the same as the letter Q.

Note that documenting limitations might be sufficient to close this.

I have come up with a simple fix for the OP use case of 'T' followed by '+' (or time zone like thing) at least Who wants to review?

How easy would it be to generalise this to allow T anywhere?

>>> dumper.dump(point, 'CC')
'20'
>>> dumper.dump(point, 'TCC')
'TCC'

How easy would it be to generalise this to allow T anywhere?

I think that would be a total rewrite of TimepointDumper._get_expression_and_properties(), because the way it works is to split the format string on 'T', take string 0 to be the date bit and string 1 to be the time (including timezone) bit (and ignore everything after any second 'T').

So for 'TCC', 'CC' is treated as the time bit.

I'm really not sure why we have special logic for this CCYYMMDDhhmmss syntax in the first place.

I would have expected isodatetime to translate this into %Y%m%d%H%M%S internally to avoid the need for two orthogonal parsing methods.

Here's a quick demo of how that would work:

from metomi.isodatetime.data import TimePoint
from metomi.isodatetime.dumpers import TimePointDumper

dumper = TimePointDumper()
timepoint = TimePoint(year=2000)

formats = [
    'CCYYMMDDThhmm',
    'CCYYMMDDThhmmT',
    'CCYYMMDDThhTmm',
    'CCYYMMDDTThhmm',
    'CCYYMMTDDThhmm',
    'CCYYTMMDDThhmm',
    'CCTYYMMDDThhmm',
]

patterns = {
    'CCYY': '%Y',
    'MM': '%m',
    'DD': '%d',
    'hh': '%H',
    'mm': '%M',
    'ss': '%S'
}

for format in formats:
    print(format)
    print('\t', dumper.dump(timepoint, format))

    newformat = format
    for patt, repl in patterns.items():
        newformat = newformat.replace(patt, repl)

    print('\t', dumper.dump(timepoint, newformat))
<FORMAT>
        This branch
        Substitution

CCYYMMDDThhmm
	 20000101T0000
	 20000101T0000
CCYYMMDDThhmmT
	 20000101T0000.       <- missing the trailing T and it's got a random dot
	 20000101T0000T
CCYYMMDDThhTmm
	 20000101T00.           <- missing the trailing T00 and it's got a random dot.
	 20000101T00T00
CCYYMMDDTThhmm
	 20000101T             <- missing the trailing T0000 (but no dot).
	 20000101TT0000
CCYYMMTDDThhmm
	 200001TDD           <- DD?
	 200001T01T0000
CCYYTMMDDThhmm
	 2000TMMDD         <- MMDD?
	 2000T0101T0000
CCTYYMMDDThhmm
	 20TYYMMDD.         <- YYMMDD?
	 CCTYY0101T0000.     <- So we might not be able to support CC via this method.

@benfitzpatrick if you're lurking on GitHub and have a mo, is there a reason why the CCYYMM syntax has special parsing logic or can/should it just get translated into %Y%m etc?

How does that perform with week dates? Currently there's some logic that means that when you have a dump format of CCYY-Www-D the CCYY bit is the week year, whereas for the same TimePoint a dump format of CCYY-MM-DD the CCYY bit is the Gregorian year.

e.g. Monday 30th December 2019

from metomi.isodatetime.data import TimePoint
from metomi.isodatetime.dumpers import TimePointDumper
dumper = TimePointDumper()

point = TimePoint(year=2019, month_of_year=12, day_of_month=30)

print(dumper.dump(point, 'CCYY-MM-DD'))
# 2019-12-30

print(dumper.dump(point, 'CCYY-Www-D'))
# 2020-W01-1

datetime.datetime.strftime has inconsistencies with ISO 8601 when it comes to week date representations:

  • %W (week of year) starts from 00 instead of 01.
  • %w (day of week) starts from 0 instead of 1

I suppose we could add

"%W": ["week_of_year"],
"%w": ["day_of_week"],

to

STRFTIME_TRANSLATE_INFO = {
"%d": ["day_of_month"],
"%F": ["century", "year_of_century", "-", "month_of_year", "-",
"day_of_month"],
"%H": ["hour_of_day"],
"%j": ["day_of_year"],
"%m": ["month_of_year"],
"%M": ["minute_of_hour"],
"%s": (
r"(?P<seconds_since_unix_epoch>[0-9]+[,.]?[0-9]*)",
"%(seconds_since_unix_epoch)s", "seconds_since_unix_epoch"),
"%S": ["second_of_minute"],
"%X": ["hour_of_day", ":", "minute_of_hour", ":", "second_of_minute"],
"%Y": ["century", "year_of_century"],
"%z": ["time_zone_sign", "time_zone_hour_abs", "time_zone_minute_abs"],
}

noting the difference with respect to strftime in the TimePointDumper() doctstring or somewhere.

However we would still need:

  • the logic to dump the ISO week year instead of the Gregorian year depending on the format string
  • possibly a new custom % letter for ISO week year (not supported by datetime.datetime.strftime)

Darned, the solution might be nastier...

For info #154 has provided a solution to the reported issue, however, it would seem that there is not a nice way to remove the quirky behaviour of the letter T which doesn't create more quirks than it solves so closing this issue as "solved in as much as we can expect it to be".