JeffreySarnoff/TimesDates.jl

Performance hit from type instability in CompoundPeriods.CompoundPeriod

logicchains opened this issue · 3 comments

In canonical.jl, the function canonical:

function canonical(cperiod::ReverseCompoundPeriod)
    result = CompoundPeriod()
    for p in cperiod
        result = (result + sum(fldmod(p)))
    end
    return sum(fldmod(sum(result)))
end

Is not type stable. This makes arithmetic on different kinds of TimeDate quite slow:

@BenchmarkTools.btime TimesDates.TimeDate(Dates.unix2datetime(23)) + TimesDates.Millisecond(23)

outputs 79.224 μs (398 allocations: 69.86 KiB), and @code_warntype shows it contains an Any.

This is around three orders of magnitude slower than the operation would take if no type instability is present, and makes a noticeable difference when operating on large arrays/dataframes (for a million items, it adds ~79 seconds of overhead).

The problem seems inherent to using CompoundPeriod, due to it using an array with an abstract element type:

mutable struct CompoundPeriod <: AbstractTime
    periods::Array{Period, 1}

Even simple use cases produce any:

function slow(p::CompoundPeriods.CompoundPeriod)
    return CompoundPeriods.CompoundPeriod() + sum(fldmod(p.periods[1]))
end
 
@code_warntype slow(Dates.CompoundPeriod(Dates.Time(23)))
Body::Any
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Dates.Period,1}, svec(Any, Int64), :(:ccall), 2, Array{Dates.Period,1}, 0, 0))::Array{Dates.Period,1}
│   %2 = invoke Dates.CompoundPeriod(%1::Array{Dates.Period,1})::Dates.CompoundPeriod
│   %3 = (Base.getfield)(p, :periods)::Array{Dates.Period,1}
│   %4 = (Base.arrayref)(true, %3, 1)::Dates.Period
│   %5 = (Main.fldmod)(%4)::Any
│   %6 = (Main.sum)(%5)::Any
│   %7 = (%2 + %6)::Any
└──      return %7

I don't see any way to fix this short of not using CompoundPeriod. If Period was a concrete rather than abstract type, with an enum member used to represent what kind of period it was, the performance issues would go away, but this would mean new period types could not be added after the fact.

One possible solution would be to use a struct containing a single member for each possible period type that a compound period might contain, all initialised to zero. Another is to stick to a UInt64 as the internal representation, like how nanosecond-precision Pandas datetimes are represented.

Thank you for the deeper dive. As you know, CompoundPeriod is introduced by the stdlib Dates. CompoundPeriods.jl was written to provide TimesDates.jl a fuller palette from which to reach over all TimePeriods. CompoundPeriods was born of necessity -- Dates was not flexible enough in its own. I share your desire for a stable and, so, cleaner implementation. Please continue to offer thoughts. I'll contemplate it too.

mutable struct TimePeriods <: AbstractTime
    nanosecond::Nanosecond
    microsecond::Microsecond
    ...
    hour::Hour
    day::Day
    week::Week

then always construct 'TimePeriods(x::CompoundPeriod)' and compute using them, finally coverting back to Compound period.
(months, years either folded in or, as I have done in the past, keep then as distinct Annualesque vs Diurnalish unit set.)

Improved speed of canonical in CompoundPeriods v3.1 (now 2x Dates.canonicalize).