redis/redis

Redis keyspace changes notification system.

Closed this issue · 25 comments

Introduction

An interesting feature in databases with a key-value data model (Redis does not perfectly fit this definition as the values are complex data structure, but the outer layer of Redis is definitely a key-value business) is the ability to subscribe in some way to the stream of events targeting a given key.

For instance I may be interested to see when the key foo is deleted or modified in some way, or to get the names of all the keys with an expire set (using the EXPIRE command) that Redis is evicting from the dataset because their time to live dropped to zero.

This feature was requested many times by the Redis user base, however so far we never reached a point where the proposed API (including proposals made by myself) seemed to fit well into the Redis design. This feature request will try to describe a new design that is extremely simple to use, to implement, and that fits well in the Redis story.

Proposal

Use of the standard Redis Pub/Sub system.

The notification system will use the Redis Pub/Sub mechanism already in place. In the past it was discussed if it would be appropriate to use a separated channels namespace compared to the normal Pub/Sub channels, but any kind of application that needs to use an arbitrary string as channel name without incurring into collisions with the keyspace notification system just needs to pick a unique prefix, like my_application_name:whatevery-string.

So the keyspace changes notification system will use standard Pub/Sub, just picking an appropriate prefix itself, that is not likely to collide with other uses.

Configurable, turned off by default

Publishing events in the Pub/Sub layer has an overhead even if no subscribers are present in the channel we are pushing to. This overhead is not big, but is measurable. At the same time not everybody will need the keyspace changes notification system. So it will be possible to turn on/off this system both via redis.conf and using CONFIG SET. By default the system will be turned off. In redis.conf you can configure it via the following directive:

notify-keyspace-events yes

Using the CONFIG command it is possible to turn the feature on or off at runtime:

CONFIG SET notify-keyspace-events yes

How events are published

Redis can operate more on a key in different ways, depending on the type of value the key holds. In our system we want to be able to provide this information to the user, by publishing the kind of modification that happened.
For instance if a command like LPUSH mykey somevalue is performed, it will result into two messages to be published in the Pub/Sub layer, equivalent to the following commands:

PUBLISH __keyspace@0__:mykey lpush
PUBLISH __keyevent@0__:lpush mykey

The event type is published in the channel about the key, while the key name is published in the channel about the event, so users can either get all the events about a given key, or all the keys that are targeted by a specific event.

As you can see the prefix includes @0 in order to target a specific Redis database number.

Type of events

Most events will have the name of the command targeting the key, like del, set and so forth, however more complex commands like MSET will result into multiple set events.

Additionally there are events that are not directly related to commands:

  • expired means that a key was expired by the dataset because the time to live dropped to zero.
  • evicted means that a key was removed from the database under memory pressure (in the case the maxmemory setting is active).

For instance in order to get push notifications of all the keys expiring a client just needs to:

SUBSCRIBE __keyevent@0__:expired

Implementation

The implementation will use the already existing API signalModifiedKey that is currently used in order to implement the WATCH command semantic.

ETA

I'll provide an implementation in the course of the next couple of weeks at max, probably earlier.

I believe hardcoding the prefix would be against the flexibility of Redis. Making prefix configurable would be pretty simple and actually implantable in a better form. So instead of letting user say _yes_ or _no_, let him choose a name like this:

CONFIG SET notify-keyspace-changes MyCustomPrefix

and in configuration file like this:

notify-keyspace-changes MyCustomPrefix

The above prefix configuration would cause a events to be published at channel (name) in this format:

PUBLISH MyCustomPrefix@0:expired foo

Not to mention this this would required nothing more that 1 configuration variable (char array instead of a boolean) to get the job done.

With this technique - how would I implement this scenario:

  1. Start some background job that will set a key when completed
  2. Listen to the completed event

Problem is that the event can fire between 1 and 2. I could start listening for the event before even starting the background job, but listening for the event is a blocking call (with the servicestack client at least).

The call I need is really, block and let me know when key x is ready - it may be ready already, in that case just let me know now.

@poulfoged Your background job should be appending to a pre-determined named list. To listen for the completed event is a blocking pop from that list.

@antirez I like where this is going.

I really like the possibility to listen for when a key expires. I have been trying to emulate that in the past, but haven't found any elegant solution.

I agree with @maxpert that it would be better to allow the user to choose a custom prefix (although his proposal doesn't make a distinction between keyevent and keyspace).

An alternative I can think of would be to allow users to register Lua callbacks on all / parts of the keyspace. You could use this mechanism to implement the above:

mycallback = function(key,op,args)
  redis.call('publish', '__keyspace@0__:' .. key, op)
  redis.call('publish', '__keyevent@0__:' .. op, key)
end

This could be used for other things like, for instance, auto-indexing:

mycallback = function(key,op,args)
  if (op == "hset") and (args[1] == "category") then
     redis.call("sadd", "index:category:" .. args[2], key)
  end
end

Note: I haven't thought about that idea so much, it could be a bad idea for a variety of reasons including performance issues. Auto-indexing as I described it could also be implemented with a daemon and the notification mechanism you propose, although 1) it would be asynchronous and 2) you didn't include the arguments in your proposal (ie. in your example we know something has been lpushed to mykey but we do not know it is myvalue).

Looking forward to it. Will be a great improvment IMHO

Interesting topic and comments.

First remark:

I think a number of people would prefer to receive notifications in queues (list) rather than pub/sub. Now it is possible to add a daemon subscribing to channels, and pushing the notifications to lists, but it means this daemon is a single point of failure in the system (and it will miss notifications when it is down).

It would be nice to have a "tee" command to automatically link a channel to a list. For instance, consider:

LSUBSCRIBE list channel

It would subscribe the list to the channel so that all items coming through the channel will be posted to the list (and also sent to the other pub/sub subscriptions). With this command, you can keep the pub/sub mechanism as a primary interface for notification management, and still offer the possibility to the users to buffer the notifications, and safely process them in a HA compatible environment.

Second remark:

This proposal (keyspace notification system) will raise the expectations of the users regarding expiration accuracy.

Recently I have been experimented with expiration notifications with this ugly hack:
https://gist.github.com/3258233

The current mechanism to handle expiration (lazy expiration + active random-based expiration) works perfectly as far as notifications are not sent. Now, let's suppose we have a large number of keys with expiration, and one key is about to expire. If the key is not accessed, it will be up to the active mechanism to expire it. The probability the active random sampling algorithm will find it quickly is very low, so the notification will be delayed (and by more than a few seconds - minutes are what I can measure in a realistic situation).

This problem is not theoretical: it really happens in practice when the number of keys is significant.

My conclusion is we need to change the data structure indexing the items to be expired to support expiration notification in a useful way. Here is my best solution:

  • remove expires dictionary from struct redisDb
  • add a 4-heap in struct redisDb implemented as a dynamic array (or deque). Each item is a timestamp and pointer to the dictionary element corresponding to the key. See the source code of libev for an example.
  • add an offset in struct dictEntry pointing to the corresponding entry of the 4-heap.
  • keep keyspace dictionary entries and 4-heap structure in sync, so that the item in the heap can be accessed from the dictionary entry and vice versa.
  • modify the active expiration algorithm to use the 4-heap instead of sampling randomly the keys. It can keep its current properties (i.e. adaptive, etc ...) except it will only process keys which really have to be expired.
  • when an expiration timer must be removed from the heap from a SET/PERSIST operation or from the lazy algorithm, just reset the timestamp/pointer of the entry [O(1)], and let the active algorithm get rid of the spurious entries.

That's a lot of work, but the following benefits can be expected:

  • accurate (sub-second) expiration notifications, at a low CPU consumption.
  • decrease memory footprint of items with expiration (no expires dictionary, a 4-heap is much more compact). The additional offset stored in dictionary entries is free (because with most 64 bits memory allocators including jemalloc and glibc malloc, 24 is not an allocation class. So 8 bytes are currently lost for each dictionary entry.
  • save rehashing of expires dictionary

There are also a few drawbacks:

  • complexity of SETEX, EXPIRE family of commands is now O(log n) instead of O(1). Complexity of SET/DEL/PERSIST, etc ... does not change though. If it is really a problem, we can imagine a mechanism where the timers are added systematically at the end of the heap [O(1)], and moved to their right location in the heap by the active expiration algorithm [O(n log n)]
  • the dynamic extension of the heap requires a clever mechanism to avoid impacting the latency at each reallocation
  • even if it is very compact, a heap is not really a COW friendly data structure

Sorry for this long entry.

Regards,
Didier.

How's this coming along?

Just wanted to say +1 for this feature. I could really use it.

With regards to the suggested implementation, The double underscore convention is one that is often seen in some programming languages, but not in Redis userland. Also, the fact that the __keyspace@N__ and __keyevent@N__ strings are hardcoded feels foreign. Most things in Redis are configurable, and I can't think of any other Redis commands that return the same string in some form every time they are called.

IMHO, the suggestion you had in #83 felt much more like Redis. Wouldn't it be simpler to add some new commands? I might suggest LISTEN, PLISTEN, UNLISTEN, PUNLISTEN. These commands would work like Redis' pub/sub mechanism (and could easily be implemented using Redis' pub/sub) but would always return a multi-bulk reply instead of an arbitrary user-defined payload. Also, they would take key names/patterns as arguments instead of channel names/patterns.

A session could look like this:

redis> LISTEN mykey
... when someone does lpush mykey ...
1) lpush
2) mykey
... when mykey expires ...
1) expired
2) mykey

The ability to listen to any particular operation (i.e. listening for any LPUSH, regardless of key) is not as useful in my experience. I believe that the ability to listen to events happening on particular keys and patterns of keys would serve the majority of use cases.

Anyway, as I started out saying this implementation feels a lot more like the rest of Redis.

@dspezia awesome remarks.

  1. Yes, there is this problem, but I wonder if we could solve this in a more general way adding the ability to create Lua scripts that are bound to channels and are executed every time a given channel receives a notification.
    Also in order to simplify the interface and to retain the scripts on restart, it is possible even to provide a feature so that if a key with a given name exists, the script contained is automatically executed.

Something like:

SET __call_on_publish__:<channelname> "... my script ..."

However this would not work with pattern matching ala PSUBSCRIBE, but probably we can't have everything.
On the other side no new API is required, scripts persist, are trivial to query, ...

  1. Yes the internals of the expire algorithm will became much more evident in this case. However there is no way we can improve the expiration algorithm without turning EXPIRE, TTL and similar commands into O(Log(N)) commands...
    About the implementation you propose, in general it seems reasonable to an interface to dict.c to store an auxiliar pointer, because it is for free as you suggest. This can be useful in other ways probably. What I'm not convinced with is the O(log(N)) requirement that we would add to a few important comments.

For instance in a cache all the SETEX calls will turn into O(Log(N)). My feeling is that this will have some effect, as the effect is clearly observable in sorted sets that use skip lists...

Trying to think a bit more about this, maybe there are ways to mitigate the problem retaining O(1) everywhere...

Btw maybe we don't need to stop this feature because of "2", but just alert the user when 2.8 will be released that there are limitations in the precision of the notification system for expired keys. Or maybe "2" will be solved before 2.8 is released so it will be an non-issue. The good thing is that both "1" and "2" will not change the API in any way.

Thanks very useful feedback!

@dspezia actually it's better than this probably, about the O(Log(N)), as we have the pointer in the dictEntry to the 4-heap entry, at least read operations like TTL are still O(1). Still things like SETEX would be O(Log(N)) that's my major concern...

+1, super excited for this.

I have a use case scenario:
I store several gameStates in redis. For each gamestate there are 2-4 players.
For each user there is a list of active games.

I would like to have the possibility to expire inactive game states.

Thus, the ability to trigger lua scripts upon expiration would allow me to update the users game lists to keep a consistent state in the database.

Any updates on this?

Hi,
This feature would help me out a lot, any updates on ETA?
Thanks, Meshi. :)

Hello, this feature is trivial to implement, I could write it in about two hours, but right now is blocked again...

Why? Read @dspezia comments in this issue, the two points he makes are too real to ignore.
The first is easily addressable using Lua scripting or some other mechanism.
The second one is a real problem right now: if we publish expire notifications, people will expect that those expire notifications are published when a key really expires, and this is not exactly the case because Redis expires are designed more to be almost for free (from the point of view of memory) than precise.

At the same time adding something in our system that is not O(1) when dealing with expires is not ideal... there is to evaluate how it actually performs, the additional memory used, the complexity it adds, and so forth.

An alternative system is to just enhance our current algorithm so that while not very precise it will be at least more reasonable for this usage.

There are many options but in order to evaluate all them and understand what to do I need to focus just on this issue, and currently PSYNC, Cluster and Sentinel are at the top of our priorities. However I hope to address this at least for 2.8 in some way, but I'm not able to provide an ETA about that.

Thank you for your quick reply!
We use Redis all the time and it's great! So thanks for that! )
Meshi.

melo commented

Hi,

not sure this if either of these are true:

@dspiza:

This proposal (keyspace notification system) will raise the expectations of the users regarding expiration accuracy.

@antirez:

people will expect that those expire notifications are published when a key really expires.

Expiration of keys was never a real-time thing (see EXPIRE docs, specially the scan Redis does to find already expired keys), We always knew that. We could get lucky and expire a key when we tried to access it or wait for it to be found. So adding expires notification at the time the current algorithm really expires a key changes nothing of the current O-notation of commands, adds no extra expectations, and gives a significant benefit for a lot of people who really want/need this.

It seems to me that the current blocking factor of this is a separate desire to make expiration more "reliable" (and I use "" because I find reliable to mean different things to different people). I would say that making expiration different than what it is today is a topic for another issue. Here, we just want notifications when a key expires, and not discuss the moment in time that event actually happens.

If/when a better expiration algorithm is devised, the notifications will still work as advertised: they trigger when redis-server expires the key. The moment that happens will only be much nearer the moment it should have happened.

Best regards,

I think @melo is right here, for what that's worth.

@melo, you changed my mind about that, expect an implementation of this soon. Thanks for your comment.

Btw actually the reality is that if the expiration is not good enough, it is not in many other areas, from memory usage in case of a big key that we can't expire in time, to the slave that does not get notifications (DELs commands) as soon as possible, and so forth. So maybe to have another case where this will be a non cool experience for the user will make more likely that the algorithm will be modified.

Incidentally I've an idea about an algorithm that does not modify complexity nor memory usage of operations but that makes expiration more prompt in the pathological cases (many keys to expire, only a few near to expire), but only adds some constant time.

Hey, an initial "play with it to get the feeling" branch implementing a few events is available (named "notifications").

Instructions here: https://gist.github.com/4608754

Feedbacks welcomed.

Implemented, closing.

do anyone know which PR implemented this feature?