Custom/"non-standard" use of T character causes it to disappear in dump
hjoliver opened this issue · 16 comments
Noticed by a cylc user:
% cylc cycle-point --template=reduced.CCYYMMDD00.t+3 20150808T00 reduced.CCYYMMDD00.t+3
reduced.2015080800.t+3
% cylc cycle-point --template=reduced.CCYYMMDD00.T+3 20150808T00 reduced.CCYYMMDD00.T+3
reduced.2015080800.+3
(note the disappearing capital-T).
Or is cylc cycle-point
abusing TimePointDumper
by passing in general string templates like this? The class doc string says Anything not matched will get left as it is in the string and T is special but is left alone, date/time separator.
In general, the CCYY notation is only intended/guaranteed for ISO 8601(-like) date-times. This isn't great behaviour though, so I'll have a look.
I think the strftime notation is better for this kind of template.
Steps to reproduce:
>>> from metomi.isodatetime.data import TimePoint
>>> point = TimePoint(year=2000)
>>> from metomi.isodatetime.dumpers import TimePointDumper
>>> dumper = TimePointDumper()
>>> dumper.dump(point, 'CCYYMMDD00.T+3')
'2000010100.+3'
I have also noticed that 'T' must be immediately followed by 'hh' to print dump the time.
>>> dumper.dump(point, 'Thh:mm')
'T00:00'
>>> dumper.dump(point, 'hh:mm')
'hh:mm'
>>> dumper.dump(point, 'T hh:mm')
'T hh:mm'
Is this a problem?
I've had a look at the function responsible for this behaviour
isodatetime/metomi/isodatetime/dumpers.py
Line 168 in a7c9d7d
Given T+03
- It detects the time part of the string based on what's after the first 'T' in the string (
+03
) - It detects the time zone part based on what's after '+' in the time part, and strips that out of the time part, so the time part becomes empty
- If the time part is not empty, it appends 'T' plus the time part to the output; so the 'T' here doesn't get outputted
is
cylc cycle-point
abusingTimePointDumper
by passing in general string templates like this?
That would be my impression.
I think (echoing Ben's comment above) that this syntax works for ISO8601-esque formats (e.g. Thh:mm
) but it should work more generally.
Only the letters PCYMDWhmsw
(and also +-0123456789
in the suffix?) should be special, you should be free to use other letters as you please. So the letter T
should behave the same as the letter Q
.
Note that documenting limitations might be sufficient to close this.
I have come up with a simple fix for the OP use case of 'T' followed by '+' (or time zone like thing) at least Who wants to review?
How easy would it be to generalise this to allow T
anywhere?
>>> dumper.dump(point, 'CC')
'20'
>>> dumper.dump(point, 'TCC')
'TCC'
How easy would it be to generalise this to allow T anywhere?
I think that would be a total rewrite of TimepointDumper._get_expression_and_properties()
, because the way it works is to split the format string on 'T', take string 0 to be the date bit and string 1 to be the time (including timezone) bit (and ignore everything after any second 'T').
So for 'TCC', 'CC' is treated as the time bit.
I'm really not sure why we have special logic for this CCYYMMDDhhmmss syntax in the first place.
I would have expected isodatetime to translate this into %Y%m%d%H%M%S
internally to avoid the need for two orthogonal parsing methods.
Here's a quick demo of how that would work:
from metomi.isodatetime.data import TimePoint
from metomi.isodatetime.dumpers import TimePointDumper
dumper = TimePointDumper()
timepoint = TimePoint(year=2000)
formats = [
'CCYYMMDDThhmm',
'CCYYMMDDThhmmT',
'CCYYMMDDThhTmm',
'CCYYMMDDTThhmm',
'CCYYMMTDDThhmm',
'CCYYTMMDDThhmm',
'CCTYYMMDDThhmm',
]
patterns = {
'CCYY': '%Y',
'MM': '%m',
'DD': '%d',
'hh': '%H',
'mm': '%M',
'ss': '%S'
}
for format in formats:
print(format)
print('\t', dumper.dump(timepoint, format))
newformat = format
for patt, repl in patterns.items():
newformat = newformat.replace(patt, repl)
print('\t', dumper.dump(timepoint, newformat))
<FORMAT>
This branch
Substitution
CCYYMMDDThhmm
20000101T0000
20000101T0000
CCYYMMDDThhmmT
20000101T0000. <- missing the trailing T and it's got a random dot
20000101T0000T
CCYYMMDDThhTmm
20000101T00. <- missing the trailing T00 and it's got a random dot.
20000101T00T00
CCYYMMDDTThhmm
20000101T <- missing the trailing T0000 (but no dot).
20000101TT0000
CCYYMMTDDThhmm
200001TDD <- DD?
200001T01T0000
CCYYTMMDDThhmm
2000TMMDD <- MMDD?
2000T0101T0000
CCTYYMMDDThhmm
20TYYMMDD. <- YYMMDD?
CCTYY0101T0000. <- So we might not be able to support CC via this method.
@benfitzpatrick if you're lurking on GitHub and have a mo, is there a reason why the CCYYMM syntax has special parsing logic or can/should it just get translated into %Y%m etc?
How does that perform with week dates? Currently there's some logic that means that when you have a dump format of CCYY-Www-D
the CCYY
bit is the week year, whereas for the same TimePoint a dump format of CCYY-MM-DD
the CCYY
bit is the Gregorian year.
e.g. Monday 30th December 2019
from metomi.isodatetime.data import TimePoint
from metomi.isodatetime.dumpers import TimePointDumper
dumper = TimePointDumper()
point = TimePoint(year=2019, month_of_year=12, day_of_month=30)
print(dumper.dump(point, 'CCYY-MM-DD'))
# 2019-12-30
print(dumper.dump(point, 'CCYY-Www-D'))
# 2020-W01-1
datetime.datetime.strftime
has inconsistencies with ISO 8601 when it comes to week date representations:
%W
(week of year) starts from 00 instead of 01.%w
(day of week) starts from 0 instead of 1
I suppose we could add
"%W": ["week_of_year"],
"%w": ["day_of_week"],
to
isodatetime/metomi/isodatetime/parser_spec.py
Lines 254 to 269 in a7c9d7d
noting the difference with respect to strftime
in the TimePointDumper()
doctstring or somewhere.
However we would still need:
- the logic to dump the ISO week year instead of the Gregorian year depending on the format string
- possibly a new custom
%
letter for ISO week year (not supported bydatetime.datetime.strftime
)
Darned, the solution might be nastier...
For info #154 has provided a solution to the reported issue, however, it would seem that there is not a nice way to remove the quirky behaviour of the letter T
which doesn't create more quirks than it solves so closing this issue as "solved in as much as we can expect it to be".