edwardlavender/patter

Speed of `julia_timeline()` is poor for long time series

Closed this issue · 1 comments

julia_timeline() is required to handle time stamps properly in Julia. It fixes the following issues:

rm(list = ls())

library(JuliaCall)
julia_setup()

# Define example time series
problem <- seq.POSIXt(as.POSIXct("2016-01-01 12:00:00", tz = "UTC"), by = "2 mins", length.out = 10)
works   <- seq.POSIXt(as.POSIXct("2016-01-01 12:00:00", tz = "UTC"), as.POSIXct("2016-02-01", tz = "UTC"), by = "2 mins")

# The structure & attributes are identical
str(problem)
str(works)
attributes(problem)
attributes(works)

# But internally the problematic time series is encoded as integers
unclass(problem) |> str()  # integer
unclass(works) |> str()    # numeric 

# `seq.POSIXt()` produces integers in Julia if used with `length.out`
julia_assign("stimeline_1", problem)
julia_command('stimeline_1') 

# `seq.POSIXt()` works if `length.out` is not used
julia_assign("stimeline_2", works)
julia_command('stimeline_2')

But julia_timeline() is slow because of as.POSIXct(format()):

# as.POSIXct(format()) works but is extremely slow for long time series
stimeline_3 <- as.POSIXct(format(problem, "%Y-%m-%d %H:%M:%S"), tz = lubridate::tz(problem))
julia_assign("stimeline_3", stimeline_3)
julia_command('stimeline_3')

# `fasttime::fastPOSIXct(seq.POSIXt())` is faster & works whether or not `length.out` is used
# > But fastime:: is still slow (e.g., 10 s) for long time series
stimeline_4 <- fasttime::fastPOSIXct(problem, "UTC")
julia_assign("stimeline_4", stimeline_4)
julia_command('stimeline_4')
stimeline_5 <- fasttime::fastPOSIXct(works, "UTC")
julia_assign("stimeline_5", stimeline_5)
julia_command('stimeline_5')

Possible optimisations include:

  1. Move julia_timeline() to assemble_*() functions so that it is called once per dataset. But this is less safe for custom datasets.

  2. Improve speed in julia_timeline() by only fixing timelines assembled by seq(..., length.out) & using fasttime if required and available:

  if (inherits(unclass(.x), "integer")) {
    warn("Use `seq.POSIXt()` with `from`, `to` and `by` rather than `length.out` for faster handling of time stamps.")
    if (lubridate::tz(.x) %in% c("GMT", "UTC") && requireNamespace("fasttime", quietly = TRUE)) {
      .x <- fasttime::fastPOSIXct(.x, tz = lubridate::tz(.x))
    } else {
      warn("Use `fasttime` for faster formatting of time stamps.")
      .x <- as.POSIXct(format(.x, "%Y-%m-%d %H:%M:%S"), tz = lubridate::tz(.x))
    }
    check_inherits(unclass(.x), "numeric")
  }

This should fix speed issues in set_yobs_vect(), where most of the time is taken by julia_timeline(). The actual assignment of a list of data.tables in Julia via julia_assign() is fast, taking up to ~1 s per data.table for a data.table with ~6 million rows.

Following ea2917f, in patter-trout project, the time required to set observations has gone from > 1 min to ~ 5 s.