Speed of `julia_timeline()` is poor for long time series
Closed this issue · 1 comments
julia_timeline()
is required to handle time stamps properly in Julia
. It fixes the following issues:
rm(list = ls())
library(JuliaCall)
julia_setup()
# Define example time series
problem <- seq.POSIXt(as.POSIXct("2016-01-01 12:00:00", tz = "UTC"), by = "2 mins", length.out = 10)
works <- seq.POSIXt(as.POSIXct("2016-01-01 12:00:00", tz = "UTC"), as.POSIXct("2016-02-01", tz = "UTC"), by = "2 mins")
# The structure & attributes are identical
str(problem)
str(works)
attributes(problem)
attributes(works)
# But internally the problematic time series is encoded as integers
unclass(problem) |> str() # integer
unclass(works) |> str() # numeric
# `seq.POSIXt()` produces integers in Julia if used with `length.out`
julia_assign("stimeline_1", problem)
julia_command('stimeline_1')
# `seq.POSIXt()` works if `length.out` is not used
julia_assign("stimeline_2", works)
julia_command('stimeline_2')
But julia_timeline()
is slow because of as.POSIXct(format())
:
# as.POSIXct(format()) works but is extremely slow for long time series
stimeline_3 <- as.POSIXct(format(problem, "%Y-%m-%d %H:%M:%S"), tz = lubridate::tz(problem))
julia_assign("stimeline_3", stimeline_3)
julia_command('stimeline_3')
# `fasttime::fastPOSIXct(seq.POSIXt())` is faster & works whether or not `length.out` is used
# > But fastime:: is still slow (e.g., 10 s) for long time series
stimeline_4 <- fasttime::fastPOSIXct(problem, "UTC")
julia_assign("stimeline_4", stimeline_4)
julia_command('stimeline_4')
stimeline_5 <- fasttime::fastPOSIXct(works, "UTC")
julia_assign("stimeline_5", stimeline_5)
julia_command('stimeline_5')
Possible optimisations include:
-
Move
julia_timeline()
toassemble_*()
functions so that it is called once per dataset. But this is less safe for custom datasets. -
Improve speed in
julia_timeline()
by only fixing timelines assembled byseq(..., length.out)
& usingfasttime
if required and available:
if (inherits(unclass(.x), "integer")) {
warn("Use `seq.POSIXt()` with `from`, `to` and `by` rather than `length.out` for faster handling of time stamps.")
if (lubridate::tz(.x) %in% c("GMT", "UTC") && requireNamespace("fasttime", quietly = TRUE)) {
.x <- fasttime::fastPOSIXct(.x, tz = lubridate::tz(.x))
} else {
warn("Use `fasttime` for faster formatting of time stamps.")
.x <- as.POSIXct(format(.x, "%Y-%m-%d %H:%M:%S"), tz = lubridate::tz(.x))
}
check_inherits(unclass(.x), "numeric")
}
This should fix speed issues in set_yobs_vect()
, where most of the time is taken by julia_timeline()
. The actual assignment of a list
of data.table
s in Julia
via julia_assign()
is fast, taking up to ~1 s per data.table
for a data.table
with ~6 million rows.
Following ea2917f, in patter-trout
project, the time required to set observations has gone from > 1 min to ~ 5 s.