dysonance/Temporal.jl

Simpler API for resampling timeseries

femtotrader opened this issue · 7 comments

You might be interested by a simpler API design for resampling.
See TimeSeriesResampler.jl
and TimeFrames.jl

Edit 2017/12/01
WIP: https://gist.github.com/femtotrader/89af55ef91f26f952835fd737a14847b

@femtotrader I think both of the packages you mention above have some serious potential use value down the line. Currently, as I am the only one working on Temporal, there are some other things that I think need to be firmed up before these features can be used in a robust manner, but I will definitely keep an eye on how these packages develop and hopefully tack on their functionality later on.

Here is an example of a simpler API implementation

This example resample with a 5 hours timeframe
I'm not using TimeFrames.jl but an anonymous function

tf = dt -> floor(dt, Dates.Hour(5))

It's using collapse function

using Temporal
import Base: mean


struct TimeSeriesResampler
    ts
    tf
end

function names(ts::TS, dim)
    if dim == 1
        ts.index
    elseif dim == 2
        ts.fields
    else
        throw(Exception("Dim must be 1 or 2"))
    end
end

function resample(ts, tf)
    TimeSeriesResampler(ts, tf)
end

function mean(resampler::TimeSeriesResampler)
    ts = resampler.ts
    f_group = resampler.tf
    z = Dates.Millisecond(0)  # doesn't work simply with z=0
    at = [false;diff(f_group.(names(ts,1))).!=z]
    collapse(ts, at, fun=mean)
end

srand(1234)
idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
n=length(idx)
price=100+cumsum(2*(rand(n)-0.5))
volume=rand(n)*1000
ts = TS([price volume], collect(idx), [:price, :volume])
#println(ts)
ts_price = ts[:price]
println(ts_price)

tf = dt -> floor(dt, Dates.Hour(5))
println(mean(resample(ts_price, tf)))

Unfortunately code is quite buggy as TS looks like

12273x1 Temporal.TS{Float64,DateTime}: 2010-01-01T03:00:00 to 2016-12-31T19:00:00
Index                 price    
2010-01-01T03:00:00   100.6282 
2010-01-01T08:00:00   101.2121 
2010-01-01T13:00:00   100.2618 
2010-01-01T18:00:00   99.5388  
2010-01-01T23:00:00   100.599

I was expecting

12273x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:00 to 2016-12-31T20:00:00
Index                 price    
2010-01-01T00:00:00   
2010-01-01T05:00:00   
2010-01-01T10:00:00   
2010-01-01T15:00:00   
2010-01-01T20:00:00   

Any idea what is going on?
Is it a bug in my implementation, a bug in collapse implementation or a misunderstanding of how things work?

It seems that this issue has to do with the floor function you're using. Take a look at floor(ts.index, Dates.Hour(5)) for example. The first three elements are all 2009-12-31T22:00:00, then the next five are 2010-01-01T03:00:00, and so on. Seems like that is where things are breaking down.

Give this a try:

collapse(ts_price, t -> hour.(t) .% 5 .== 0, fun=mean)

Good catch @dysonance

...
function mean(resampler::TimeSeriesResampler)
    ts = resampler.ts
    f_group = resampler.tf
    collapse(ts, f_group, fun=mean)
end
...
tf = t -> Dates.hour.(t) .% 5 .== 0
println(mean(resample(ts_price, tf)))

displays

12785x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:00 to 2016-12-31T20:00:00
Index                 price    
2010-01-01T00:00:00   100.1817 
2010-01-01T05:00:00   100.9888 
2010-01-01T10:00:00   101.1463 
2010-01-01T15:00:00   100.0456 
2010-01-01T20:00:00   99.3642  
2010-01-02T00:00:00   101.7938
...

I hope to have a function chaining / pipe syntax in Julia
but that's a quite old issue JuliaLang/julia#5571

ts_price.resample(tf).mean()

(or similar if dot is not possible) will be great.

I think my implementation of TimeFrame in TimeFrames.jl is buggy (because of this floor/ceil use)

You might try with the following index (with a 1 second offset)

idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
idx = collect(idx) + Dates.Second(1)

Here is what we get

12785x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:01 to 2016-12-31T20:00:01
Index                 price    
2010-01-01T00:00:01   100.1817 
2010-01-01T05:00:01   100.9888 
2010-01-01T10:00:01   101.1463 
2010-01-01T15:00:01   100.0456 
2010-01-01T20:00:01   99.3642  
2010-01-02T00:00:01   101.7938

Resampling with a 5 hour timeframe should return a TS from 2010-01-01T00:00:00 to 2016-12-31T20:00:00 (ie without the 1 second offset)

Let's use instead a TimeFrame of 4 hours (24 hours is multiple of 4 not 5). It was stupid.

using Temporal
import Base: mean


struct TimeSeriesResampler
    ts
    tf
end

function names(ts::TS, dim)
    if dim == 1
        ts.index
    elseif dim == 2
        ts.fields
    else
        throw(Exception("Dim must be 1 or 2"))
    end
end

function resample(ts, tf)
    TimeSeriesResampler(ts, tf)
end

function mean(resampler::TimeSeriesResampler)
    ts = resampler.ts
    f_group = resampler.tf
    z = Dates.Millisecond(0)  # doesn't work simply with z=0
    at = [false;diff(f_group.(names(ts,1))).!=z]
    collapse(ts, at, fun=mean)
end

srand(1234)
idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
idx = collect(idx) + Dates.Second(1)
n=length(idx)
price=100+cumsum(2*(rand(n)-0.5))
volume=rand(n)*1000
ts = TS([price volume], idx, [:price, :volume])
ts_price = ts[:price]
println(ts_price)

tf = dt -> floor(dt, Dates.Hour(4))
println(mean(resample(ts_price, tf)))

but I still don't understand why this 1 second offset is still present in TS index.

15341x1 Temporal.TS{Float64,DateTime}: 2010-01-01T04:00:01 to 2016-12-31T20:00:01
Index                 price    
2010-01-01T04:00:01   100.7737 
2010-01-01T08:00:01   101.3009 
2010-01-01T12:00:01   100.2978 
2010-01-01T16:00:01   99.6753  
2010-01-01T20:00:01   99.317   

Given the fact that initial TS looks like

61368x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:01 to 2016-12-31T23:00:01
Index                 price    
2010-01-01T00:00:01   100.1817 
2010-01-01T01:00:01   100.7153 
2010-01-01T02:00:01   100.8478 
2010-01-01T03:00:01   100.7679 
2010-01-01T04:00:01   101.356  
2010-01-01T05:00:01   102.0643 
2010-01-01T06:00:01   101.4654 

and

julia> floor(DateTime("2010-01-01T00:00:01"), Dates.Hour(4))
2010-01-01T00:00:00

julia> floor(DateTime("2010-12-31T23:00:01"), Dates.Hour(4))
2010-12-31T20:00:00

we should expect resampled TS index from 2010-01-01T00:00:00 to 2010-12-31T20:00:00