Simpler API for resampling timeseries
femtotrader opened this issue · 7 comments
You might be interested by a simpler API design for resampling.
See TimeSeriesResampler.jl
and TimeFrames.jl
Edit 2017/12/01
WIP: https://gist.github.com/femtotrader/89af55ef91f26f952835fd737a14847b
@femtotrader I think both of the packages you mention above have some serious potential use value down the line. Currently, as I am the only one working on Temporal, there are some other things that I think need to be firmed up before these features can be used in a robust manner, but I will definitely keep an eye on how these packages develop and hopefully tack on their functionality later on.
Here is an example of a simpler API implementation
This example resample with a 5 hours timeframe
I'm not using TimeFrames.jl but an anonymous function
tf = dt -> floor(dt, Dates.Hour(5))
It's using collapse
function
using Temporal
import Base: mean
struct TimeSeriesResampler
ts
tf
end
function names(ts::TS, dim)
if dim == 1
ts.index
elseif dim == 2
ts.fields
else
throw(Exception("Dim must be 1 or 2"))
end
end
function resample(ts, tf)
TimeSeriesResampler(ts, tf)
end
function mean(resampler::TimeSeriesResampler)
ts = resampler.ts
f_group = resampler.tf
z = Dates.Millisecond(0) # doesn't work simply with z=0
at = [false;diff(f_group.(names(ts,1))).!=z]
collapse(ts, at, fun=mean)
end
srand(1234)
idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
n=length(idx)
price=100+cumsum(2*(rand(n)-0.5))
volume=rand(n)*1000
ts = TS([price volume], collect(idx), [:price, :volume])
#println(ts)
ts_price = ts[:price]
println(ts_price)
tf = dt -> floor(dt, Dates.Hour(5))
println(mean(resample(ts_price, tf)))
Unfortunately code is quite buggy as TS
looks like
12273x1 Temporal.TS{Float64,DateTime}: 2010-01-01T03:00:00 to 2016-12-31T19:00:00
Index price
2010-01-01T03:00:00 100.6282
2010-01-01T08:00:00 101.2121
2010-01-01T13:00:00 100.2618
2010-01-01T18:00:00 99.5388
2010-01-01T23:00:00 100.599
I was expecting
12273x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:00 to 2016-12-31T20:00:00
Index price
2010-01-01T00:00:00
2010-01-01T05:00:00
2010-01-01T10:00:00
2010-01-01T15:00:00
2010-01-01T20:00:00
Any idea what is going on?
Is it a bug in my implementation, a bug in collapse
implementation or a misunderstanding of how things work?
It seems that this issue has to do with the floor
function you're using. Take a look at floor(ts.index, Dates.Hour(5))
for example. The first three elements are all 2009-12-31T22:00:00, then the next five are 2010-01-01T03:00:00, and so on. Seems like that is where things are breaking down.
Give this a try:
collapse(ts_price, t -> hour.(t) .% 5 .== 0, fun=mean)
Good catch @dysonance
...
function mean(resampler::TimeSeriesResampler)
ts = resampler.ts
f_group = resampler.tf
collapse(ts, f_group, fun=mean)
end
...
tf = t -> Dates.hour.(t) .% 5 .== 0
println(mean(resample(ts_price, tf)))
displays
12785x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:00 to 2016-12-31T20:00:00
Index price
2010-01-01T00:00:00 100.1817
2010-01-01T05:00:00 100.9888
2010-01-01T10:00:00 101.1463
2010-01-01T15:00:00 100.0456
2010-01-01T20:00:00 99.3642
2010-01-02T00:00:00 101.7938
...
I hope to have a function chaining / pipe syntax in Julia
but that's a quite old issue JuliaLang/julia#5571
ts_price.resample(tf).mean()
(or similar if dot is not possible) will be great.
I think my implementation of TimeFrame in TimeFrames.jl is buggy (because of this floor
/ceil
use)
You might try with the following index (with a 1 second offset)
idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
idx = collect(idx) + Dates.Second(1)
Here is what we get
12785x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:01 to 2016-12-31T20:00:01
Index price
2010-01-01T00:00:01 100.1817
2010-01-01T05:00:01 100.9888
2010-01-01T10:00:01 101.1463
2010-01-01T15:00:01 100.0456
2010-01-01T20:00:01 99.3642
2010-01-02T00:00:01 101.7938
Resampling with a 5 hour timeframe should return a TS
from 2010-01-01T00:00:00
to 2016-12-31T20:00:00
(ie without the 1 second offset)
Let's use instead a TimeFrame of 4 hours (24 hours is multiple of 4 not 5). It was stupid.
using Temporal
import Base: mean
struct TimeSeriesResampler
ts
tf
end
function names(ts::TS, dim)
if dim == 1
ts.index
elseif dim == 2
ts.fields
else
throw(Exception("Dim must be 1 or 2"))
end
end
function resample(ts, tf)
TimeSeriesResampler(ts, tf)
end
function mean(resampler::TimeSeriesResampler)
ts = resampler.ts
f_group = resampler.tf
z = Dates.Millisecond(0) # doesn't work simply with z=0
at = [false;diff(f_group.(names(ts,1))).!=z]
collapse(ts, at, fun=mean)
end
srand(1234)
idx=DateTime(2010,1,1):Dates.Hour(1):DateTime(2017,1,1)-Dates.Hour(1)
idx = collect(idx) + Dates.Second(1)
n=length(idx)
price=100+cumsum(2*(rand(n)-0.5))
volume=rand(n)*1000
ts = TS([price volume], idx, [:price, :volume])
ts_price = ts[:price]
println(ts_price)
tf = dt -> floor(dt, Dates.Hour(4))
println(mean(resample(ts_price, tf)))
but I still don't understand why this 1 second offset is still present in TS index.
15341x1 Temporal.TS{Float64,DateTime}: 2010-01-01T04:00:01 to 2016-12-31T20:00:01
Index price
2010-01-01T04:00:01 100.7737
2010-01-01T08:00:01 101.3009
2010-01-01T12:00:01 100.2978
2010-01-01T16:00:01 99.6753
2010-01-01T20:00:01 99.317
Given the fact that initial TS looks like
61368x1 Temporal.TS{Float64,DateTime}: 2010-01-01T00:00:01 to 2016-12-31T23:00:01
Index price
2010-01-01T00:00:01 100.1817
2010-01-01T01:00:01 100.7153
2010-01-01T02:00:01 100.8478
2010-01-01T03:00:01 100.7679
2010-01-01T04:00:01 101.356
2010-01-01T05:00:01 102.0643
2010-01-01T06:00:01 101.4654
and
julia> floor(DateTime("2010-01-01T00:00:01"), Dates.Hour(4))
2010-01-01T00:00:00
julia> floor(DateTime("2010-12-31T23:00:01"), Dates.Hour(4))
2010-12-31T20:00:00
we should expect resampled TS index from 2010-01-01T00:00:00
to 2010-12-31T20:00:00