Parsing ISO 8601 / RFC 3339 datetime string?
Opened this issue · 8 comments
What's the correct way to parse a ISO 8601 / RFC 3339 datetime string?
This is very common in json communication.
On the server side we are using Rust for our API and DateTime::to_rfc3339() to convert the datetimes to String for the json API, which can also be expressed with the format string "%+":
> %+: Same to %Y-%m-%dT%H:%M:%S%.f%:z, i.e. 0, 3, 6 or 9 fractional digits for seconds and colons in the time zone offset.
So it has a variable number of digits for the fractional seconds, depending on the timestamp in question.
If it falls on a second boundary, it has 0 fractional second digits, like "1970-01-01T00:00:00+00:00".
Also it has the timezone at the end.
How can I parse this ISO 8601 / RFC 3339 datetime string in my PureScript frontend?
I think at the moment writing a parser using purescript-parsing or something like that is probably your best bet, as I guess the format language we have in here at the moment isn't expressive enough for that.
@Boscop you could build multiple formats and use unformatParser like this:
myParser
= try (unformatParser format1)
<|> try (unformatParser format2)
<|> unformatParser format3
parse str = runParser str myParser Note, you might wanna use actually I think you need try, but it should be possible to order formats in such a way that it's not needed.try and it can't be avoided.
Also you can just use unformat and <|>:
parse str
= unformat format1 str
<|> unformat format2 str
<|> unformat format3 str@safareli Thanks. But I also need support for microseconds like "2017-11-21T05:16:29.120116+00:00" and it doesn't support that (only milliseconds):
https://github.com/slamdata/purescript-formatters/blob/v3.0.0/src/Data/Formatter/DateTime.purs#L122
Would it be possible to add support for microseconds (6 digits) (and maybe nanoseconds (9 digits))? :)
Also, is there a way that I only have to parse the format string once at the first use, and then not on subsequent uses? With a lazy variable somehow?
There'll be a bit of a problem there since the DateTime representation that is being parsed/formatted is only millisecond-precise.
You could just create the format string at the top level and re-use it, then the parse cost is at startup. Lazy might well be another option. But I'd suggest constructing the format commands directly rather than using the string parsing method as another option: #22 🙂
@garyb But how can I make it re-use the evaluated value?
I currently do this:
fmt_rfc3339 = parseFormatString "YYYY-MM-DDTHH:mm:ss+00:00"
fmt_german = parseFormatString "DD.MM.YYYY, HH:mm"
humanTime s = either id id do
decode <- fmt_rfc3339
encode <- fmt_german
datetime <- unformat decode s
pure $ format encode datetimeIs that the most efficient way to do it?
There'll be a bit of a problem there since the DateTime representation that is being parsed/formatted is only millisecond-precise.
That's ok, it can round to the nearest millisecond.. Or even just truncate/ignore them. It should still be able to parse it though.. :)
Yes parseFormatString parses format string into Format value. if you are declaring format on top level you can also do this so if format was invalid for some reason you get an error on start up:
fmt_rfc3339 :: Format
fmt_rfc3339 = case parseFormatString "YYYY-MM-DDTHH:mm:ss+00:00" of
Left err -> unsafeCrushWith $ "format must have been valid " <> show err
Right x -> x
fmt_german :: Format
fmt_german = case parseFormatString "DD.MM.YYYY, HH:mm" of
Left err -> unsafeCrushWith $ "format must have been valid " <> show err
Right x -> x
humanTime s = either id id do
datetime <- unformat fmt_rfc3339 s
pure $ format fmt_german datetimeAlso as @garyb noted you can just build this formats like this #22 and you woulnd't need the parseFormatString.
If you {nano,micro}seconds are in the end of the input string, and you are willing to play with parser combinatorics you can use unformatParser to get datetime and then discard rest of the string. (runPwhich use used to create unformat function adds eof parser to unformatParser)
Would you accept a PR that adds formatters (UUU,MicrosecondsRounded) and (NNN,NanosecondsRounded)?
Currently, I can't parse this: "2019-08-07T10:16:58.055246Z"
EDIT: Sign/constructor change to better reflect that rounding takes place