Parsing of milliseconds in dates
juliangamble opened this issue · 22 comments
Here the author writes:
Would it also parse ISO timestamps like
2015-03-05T19:40:53.324Z
? Eg. what you get from javascriptnew Date().toISOString()
.
What we need is
strptime("%Y-%m-%dT%H:%M:%S.%fZ")
similar to how python does it.
is progress in #1413 (comment) kind of abandoned?
hi @juliangamble
jq-1.6 had supports strptime usage.you can try it
root@oss-001:test_jq# echo '"2015-03-05T23:51:47Z"' | jq 'strptime("%Y-%m-%dT%H:%M:%SZ")'
[
2015,
2,
5,
23,
51,
47,
4,
63
]
You can see detailed usage in the manual or jq-1.6 manual.yml.
https://github.com/stedolan/jq/blob/master/docs/content/manual/v1.6/manual.yml
I think this is an issue about %f
support which jq-1.6 does not support.
% echo '"2015-03-05T23:51:47.487Z"' | jq 'strptime("%Y-%m-%dT%H:%M:%S.%fZ")'
jq: error (at <stdin>:1): date "2015-03-05T23:51:47.487Z" does not match format "%Y-%m-%dT%H:%M:%S.%fZ"
% python -c 'from datetime import datetime; print(datetime.strptime("2015-03-05T23:51:47.487Z", "%Y-%m-%dT%H:%M:%S.%fZ"))'
2015-03-05 23:51:47.487000
Struggled today with this, problem is that the unix "strptime" doesn't support milliseconds. As such, I switched to using a regex replace
.last_updated |= sub("(?<time>.*)\\..*Z"; "\(.time)Z")
So I'm transforming this "2015-03-05T23:51:47.487Z" to "2015-03-05T23:51:47Z"
When you want to format a timestamp which contains offset:
sub("(?<time>.*)\\.[\\d]{3}(?<tz>.*)"; "\(.time)\(.tz)")
This will transform the following:
2015-03-05T23:51:47.487Z
to 2015-03-05T23:51:47Z
And will work with this as well:
2015-03-05T23:51:47.487+0100
to 2015-03-05T23:51:47+0100
was this solved?
I am still not being able to parse milliseconds
jq -n 'now | strftime("%FT%H:%M:%SZ")'
"2022-03-16T19:16:22Z"
You could use gojq:
echo '"2015-03-05T23:51:47.487Z"' | gojq 'strptime("%Y-%m-%dT%H:%M:%S.%fZ")'
[
2015,
2,
5,
23,
51,
47.486999988,
4,
63
]
Struggled today with this, problem is that the unix "strptime" doesn't support milliseconds. As such, I switched to using a regex replace
.last_updated |= sub("(?<time>.*)\\..*Z"; "\(.time)Z")
So I'm transforming this "2015-03-05T23:51:47.487Z" to "2015-03-05T23:51:47Z"
If you do not care about preserving the number of milliseconds anyway, you can abuse the fact that parsing %G
(ISO week number) has no effect (in glibc, at least) and allows arbitrarily large numbers to be part of the input string.
$ echo '"2015-03-05T23:51:47.487Z"' | jq 'strptime("%Y-%m-%dT%H:%M:%S.%GZ")'
[
2015,
2,
5,
23,
51,
47,
4,
63
]
this seems to be platform-dependent... @hvdijk example above does not work for me:
$ echo '"2015-03-05T23:51:47.487Z"' | jq 'strptime("%Y-%m-%dT%H:%M:%S.%GZ")'
jq: error (at <stdin>:1): date "2015-03-05T23:51:47.487Z" does not match format "%Y-%m-%dT%H:%M:%S.%GZ"
I think this is because I am on macos ... I have similar problem with date
command in the shell
We might need to import an implementation of strftime()
/strptime()
. By using the one from the OS we end up exposing differences in functionality between all the OSes, which clearly annoys a lot of users including me.
I ran into this issue today while massaging logs. Did this workaround:
Put text below into fromdate.jq
, add argument -L <directory for fromjq.jq>
, prepend include "fromdate";
to filter.
# replace fromdateiso8601 and fromdate with ones that supports fractional seconds
# NOTE: does not support timezones
# Usage:
# $ jq -n -L . 'include "fromdate"; "2024-02-13T11:10:32.123Z" | fromdate'
# 1707822632.123
def fromdateiso8601:
( capture("(?<y>\\d+)-(?<m>\\d+)-(?<d>\\d+)T(?<H>\\d+):(?<M>\\d+):(?<S>\\d+)(?<F>\\.\\d+)?Z") as {$y,$m,$d,$H,$M,$S,$F}
| [$y,$m,$d,$H,$M,$S,0,0]
| map(tonumber)
| .[1] |= .-1 # month starts at 0
| mktime + ($F | if . then tonumber else 0 end)
) // error("date \"\(.)\" does not match format");
def fromdate: fromdateiso8601;
Had a quick look into fixing it in jq, is a bit messy. Maybe the least painful way is to modify your own strptime
in utils.c
to add fraction support but then we also have to have our own tm
struct etc as tm_sec
is an int. Are there libc's that have a strptime
with fraction support? how do the communicate fractions?
Are there libc's that have a
strptime
with fraction support?
I suspect not. The standard requires tm_sec
to be int
so no implementation can support fractions that way, the only way would be by adding a new member (say, tm_nsec
), but that new member cannot by supported by mktime
because standard programs may leave tm_nsec
uninitialised requiring it to be ignored, whereas sensible programs using that hypothetical platform extension would expect the usual mktime
behaviour where out-of-range values are valid and the struct tm
is normalised by adjusting other fields. If mktime
sees a tm_nsec
value of e.g. -1
, it has no way of knowing whether that was uninitialised and therefore whether it should be ignored.
ksh solves it by having a struct Tm_s
which has all the standard struct tm
members, plus extra. It has a tmscan
function, mostly a wrapper around tmxscan
, which does basically the same thing as strptime
except for filling a struct Tm_s
rather than struct tm
. And it even has its own strptime
function (I think it's from an era where libc could not be assumed to provide it) that works by calling tmscan
and copying the standard fields over from struct Tm_s
to struct tm
, but since you actually want the new fields too, you would not use it. ksh code could be included in jq directly if its license (EPL) is acceptable, or the same approach can be taken with a custom implementation.
@hvdijk Thanks for the info and break down! if someone would look into this i think there are at least two viable paths as i see it:
- Modify the existing
strptime
implementation to add%f
. Seems to be what other implementations call fractions, but didn't find what it comes from? the ksh seems to have%N
? - Adopt some other
strptime
implementation like ksh
Another concern is how to support both integer seconds and optionally fractions. A maybe not optimal but straight forward solution is to redefine fromdate
as def fromdateiso8601: strptime("%Y-%m-%dT%H:%M:%SZ")? // strptime("%Y-%m-%dT%H:%M:%S%fZ") | mktime;
i guess?
Also i'm also not sure if something like this should be coordinated with other date/time improvement, there is a bunch of issues or old open PRs touching similar things.
Seems to be what other implementations call fractions, but didn't find what it comes from? the ksh seems to have
%N
?
nicowilliams pointed out in #1413 that there is a lot of variance in other implementations. Ruby appears to use %N
like ksh to mean fractional seconds, allowing a width to be specified between the %
and the N
. It is up to the user to specify the decimal separator between %S
and %N
. R appears to use %OS
to mean seconds including decimal separator and fractions, and has the width after it. But %OS
is specified in ISO C to mean "the seconds, using the locale’s alternative numeric symbols", rather than "the seconds, including fractions", so it seems questionable to repurpose this to mean something different. ISO 8601 uses a decimal separator between s
s (as in e.g. ss,sss
or ss.s
) to specify fractional seconds, but of course that does not work for any strftime
/strptime
. Python uses %f
, yes, and it seems to have gone with a different letter because Python specifically did not want to implement the at-the-time established %N
, because supporting %3N
, %6N
, etc. complicated parsing and no other specifier needed anything like that? source. So in Python, %f
in strftime
is simply always exactly six digits. Presumably jq will want whatever is added to strptime
to also be added to strftime
too.
Another concern is how to support both integer seconds and optionally fractions.
In Python, there is an open issue to add support for this in some way, python/cpython#100929. No way has been picked yet.
In Ruby, I cannot find anything indicating support for this.
In R, with %OS
, this is already handled automatically.
In .NET (DateTime.ParseExact
), this can be handled by using the form that takes a list of permitted formats.
Also i'm also not sure if something like this should be coordinated with other date/time improvement
Good point. Since you specifically say %Y-%m-%dT%H:%M:%SZ
-- hardcoding Z
there may or may not always be desirable. Issue #1053 may be good to consider at the same time.
Great summary, learned a lot, and it seems we're not alone with this mess :) personally i like the R approach with the %OS
variants that optionally support fractions. That would minimize performance impact and also makes it possible to use it in custom strptime
formats.
@hvdijk is this something you would like to work on? not sure how much time/motivation i have to put into it atm.
@hvdijk is this something you would like to work on? not sure how much time/motivation i have to put into it atm.
I'm happy to have a more in-depth look into this and related strftime
/strptime
issues next week. Whether I'd also be able to do any coding for it I don't know yet, it's possible that this more in-depth look reveals more problems that need to be accounted for, but in that case at least writing down those problems should be helpful.
👍 sounds good
Representation in struct tm
There are two ways we can represent fractional seconds in jq
's equivalent of struct tm
(an array): we can either include it in the seconds field, or we can add a new field. Experimentation reveals that jq
already includes fractional seconds in the seconds field:
$ ~/jq/jq -rnc '0.25 | gmtime'
[1970,0,1,0,0,0.25,4,0]
Unless a compelling reason is given to change this, I would suggest keeping this as it is now.
Representation in time_t
It seems obvious that if time t
represents a particular moment in time, t+1
represents one second later, then t+0.25
should represent 250 milliseconds later. But this is less obvious when we use negative timestamps. The current behaviour of jq
appears to be that if gmtime
is given a fractional value, t - floor(t)
represents the fractional seconds to be added to the time represented by trunc(t)
. That is, currently we have:
$ ~/jq/jq -rn '[-1.25, -1, -0.75, -0.25, 0, 0.25, 0.75, 1, 1.25][] | tostring + ": " + (gmtime | tostring)'
-1.25: [1969,11,31,23,59,59.75,3,364]
-1: [1969,11,31,23,59,59,3,364]
-0.75: [1970,0,1,0,0,0.25,4,0]
-0.5: [1970,0,1,0,0,0.5,4,0]
-0.25: [1970,0,1,0,0,0.75,4,0]
0: [1970,0,1,0,0,0,4,0]
0.25: [1970,0,1,0,0,0.25,4,0]
0.5: [1970,0,1,0,0,0.5,4,0]
0.75: [1970,0,1,0,0,0.75,4,0]
1: [1970,0,1,0,0,1,4,0]
1.25: [1970,0,1,0,0,1.25,4,0]
This breaks monotonicity and does not match Python, which shows:
$ TZ=UTC python3 -c 'from datetime import datetime
print(datetime.fromtimestamp(-1.25))'
1969-12-31 23:59:58.750000
Python simply makes it so that a time_t
value of -1.25
means 1.25
seconds before a time_t
value of 0
.
Implementing this in jq
will result in a subtle incompatibility. In my opinion, this is justifiable.
Locale
jq calls setlocale(LC_ALL, "");
at startup and uses the current locale for strftime
/strptime
. There is no way to change the locale within a jq
script. It would be convenient to be able to do so, but this is something I would consider not necessary to be part of the same PR.
I believe that as strftime
and strptime
are locale-aware, whatever is created for fractional seconds should also use the locale-specific decimal separator. Pretending for a moment that we have %OS
for this, I believe the correct behaviour for jq
will be:
$ export LC_ALL=en_US.UTF-8
$ ~/jq/jq -rn '1234.5 | strftime("%H:%M:%OS")'
00:20:34.500000 (hypothetical, not current jq output)
$ export LC_ALL=nl_NL.UTF-8
$ ~/jq/jq -rn '1234.5 | strftime("%H:%M:%OS")'
00:20:34,500000 (hypothetical, not current jq output)
If a user wishes to parse a timestamp that is formatted with a decimal separator other than the one used in the user's locale, the user should set the locale prior to invoking jq
, as in e.g. LC_ALL=C jq
.
As for which format specifier to use, %OS
exists already:
$ export LC_TIME=fa_IR.UTF-8
$ ~/jq/jq -rn '1234567890 | strftime("%Oy/%Om/%Od %OH:%OM:%OS")'
۰۹/۰۲/۱۳ ۲۳:۳۱:۳۰
I believe it would be wrong to change this to print ۰۹/۰۲/۱۳ ۲۳:۳۱:30.000000
or such. %OS
already has an established meaning, jq
already uses it in that established meaning, and breaking that would lead to an inconsistency in jq
.
Standard strftime
/strptime
have E
and O
as modifier characters. %O*
means to use an alternative numeric symbols, %E*
means to use an alternative era-based format. Neither seems appropriate.
However, we can take inspiration from fprintf
and support %.3S
to print seconds with three decimals. We can also support %.S
to print seconds with a reasonable to-be-determined number of decimals. As for strptime
, we can support %S
to read integer seconds as we do now, preserving compatibility as much as we reasonably can, but support %.S
to read seconds either with or without fractions.
There is no way to use alternative numeric symbols with fractional seconds: the way alternative digits are specified in locale data only tells us how to format values 0-99. Because this is provably impossible to support, I propose not even trying.
For both strftime
and strptime
, it should be possible to have a custom implementation that parses the format string, handles each literal character itself, handles %.(n)S
itself, and handles every other format specifier by calling the standard strftime
and strptime
functions.
Time zones
Time zones are permitted in strftime
, using the %z
and %Z
specifiers, but these do not work properly in strftime
, and are ignored in strptime
(on glibc). These are issues #2475 and #2195.
$ TZ=Europe/Amsterdam ~/jq/jq -cnr '1234567890 | strftime("%Y-%m-%dT%H:%M:%S%Z")'
2009-02-13T23:31:30CET
$ TZ=Europe/Amsterdam ~/jq/jq -cnr '1234567890 | strftime("%Y-%m-%dT%H:%M:%S%z")'
2009-02-13T23:31:30+0000
$ ~/jq/jq -cnr '["2001-01-01T12:34:56+0000", "2001-01-01T12:34:56+0100", "2001-01-01T12:34:56+0200"][] | strptime("%Y-%m-%dT%H:%M:%S%z")'
[2001,0,1,12,34,56,1,0]
[2001,0,1,12,34,56,1,0]
[2001,0,1,12,34,56,1,0]
jq
tries to handle this in its my_mktime
function if the system provides a struct tm
definition that includes time zone information, but given that jv2tm
and tm2jv
are hardcoded to only preserve the standard fields of struct tm
, I do not see how this can possibly work. I believe the right thing to do here is to ensure that any time zone information that the system's struct tm
provides is saved and restored, at which point the existing functions should just do the right thing.
Because, at least for common implementations, this does not appear to require any changes to strftime
/strptime
itself, I suspect this would not conflict with the changes needed for fractional seconds, and we can keep them separate.
Summary
I think for this particular issue:
- change the handling of negative
time_t
values to be what Python does - create wrapper functions for
strftime
andstrptime
that accept%.S
and%.3S
(and other widths) to format and parse fractional seconds, using the decimal separator of the current locale
I think independent of this issue:
- preserve any platform-specific time zone information in
struct tm
(covered by existing issues) - add a way to change the locale from within a
jq
script (not covered by an existing issue, but related issue: #2218)
Do these seem reasonable, or do you prefer something else?
Thanks 👍 that summary will be very useful for me or someone that want to implement support for this.
By wrapping you mean have a functions that parse/massaga the format in way that strftime
/strptime
don't see or have to care about %.S
etc?
I wonder if an alternativ is to import some strftime
implementation in addition to the existing strptime
implementation, modify both to support fractions and then use them unconditionally for all platforms?
@nicowilliams when you have time: any thought?
By wrapping you mean have a functions that parse/massaga the format in way that
strftime
/strptime
don't see or have to care about%.S
etc?
Basically yes, I was thinking we can implement our_strftime("%H:%M:%.S", t)
as (pseudo-code) strftime("%H", t) + ":" + strftime("%M", t) + ":" + our_strftime_seconds(t)
. Or optionally combine more to reduce the number of calls to the standard strftime
function. The advantage here is that everything that the underlying libc supports, is already handled automatically, whereas a fully custom strftime
implementation is hard to get right, especially that alternative digit stuff. And likewise for strptime
(although that one is a little bit more complicated).
Yes, I think we're going to have to import an implementation of strftime()
/strptime()
. The alternative is to say "sorry, complain to your platform vendor/distro".
We might need to import an implementation of
strftime()
/strptime()
. By using the one from the OS we end up exposing differences in functionality between all the OSes, which clearly annoys a lot of users including me.
Yes, I agree with the need.
I was just making some scripts that I wrote on Linux available cross-platform, so they will also work on Windows. One of the issues in doing so is that I need to modify many of the jq
commands because, on Windows 11, the following error is issued when those jq
commands are executed: strptime/1 only supports ISO 8601 on this platform
.