etsy/skyline

alternatives to redis?

Opened this issue · 4 comments

Hi people,
I'm currently working on a quite similar problem (trying to tame a gazilion metrics) and get some kind of anomaly detection into the mess.

So I wanted to ask why redis is backend? It seems a strange choice to me, being in memory it makes it limited by memory rather then disk (and you usually have a hell lot more disk). It also seems to require storing the timestamp with each metric effectively doubling (or more then given msgpack overhead) the storage consumption. And last but not least I can't see how the append operation is considered O(1) when it needs to relocat the whole data every time the size doubles it sounds like O(sqrt(n)) given the size of the data is always the same.

What I could not find is how long historical data is preserved given I know that I'm producing about 8g a day I can see to run out of memory in about 16 days not taking overhead into account probably 8 or less with that.

So I was wondering if you would be up for a discussion of an alternative backend(s), I currently ended up using Cassandra behind KairosDB (easier to write to and nice for aggregation) which so far works quite well and has a very sound storage mechanism with Cassandars Column based storage.

Cheers,
Heinz

being in memory it makes it limited by memory rather then disk
Yes, but memory is way faster than disk. We want real time, right?

It also seems to require storing the timestamp

Storing the timestamp was a decision that we made because we had different
levels of resolution in our data. Some algorithms might need to make use of
the actual time in order to work. None have so far, though :) I'd be open
to adding a setting to not store any timestamps - it'd be a big memory
boost.

I can't see how the append operation is considered O(1)
http://redis.io/commands/append

how long historical data is preserved
settings.FULL_DURATION

a different backend
I'll need some more convincing. At scale, if you want very quick detection,
you really need to use an in memory datastore. That becomes less true as
you have a smaller amount of metrics, though. However, if we can think of a
modular and easy way to support different backends, I'd be open to
supporting that in the project.

On Sun, Sep 22, 2013 at 5:10 AM, Heinz N. Gies notifications@github.comwrote:

Hi people,
I'm currently working on a quite similar problem (trying to tame a
gazilion metrics) and get some kind of anomaly detection into the mess.

So I wanted to ask why redis is backend? It seems a strange choice to me,
being in memory it makes it limited by memory rather then disk (and you
usually have a hell lot more disk). It also seems to require storing the
timestamp with each metric effectively doubling (or more then given msgpack
overhead) the storage consumption. And last but not least I can't see how
the append operation is considered O(1) when it needs to relocat the whole
data every time the size doubles it sounds like O(sqrt(n)) given the size
of the data is always the same.

What I could not find is how long historical data is preserved given I
know that I'm producing about 8g a day I can see to run out of memory in
about 16 days not taking overhead into account probably 8 or less with that.

So I was wondering if you would be up for a discussion of an alternative
backend(s), I currently ended up using Cassandra behind KairosDB (easier to
write to and nice for aggregation) which so far works quite well and has a
very sound storage mechanism with Cassandars Column based storage.

Cheers,
Heinz


Reply to this email directly or view it on GitHubhttps://github.com//issues/51
.

Abe Stanway
abe.is

hit9 commented

Redis makes it difficult to run conditional queries.

What about TempoDB as an extra backend?

hit9 commented

how about ssdb