QSCAN command proposal
antirez opened this issue · 9 comments
QSCAN [COUNT <count>] [BLOCKING] [MINLEN <len>] [MAXLEN <len>] [IMPORTRATE <rate>]
This issue is a proposal for a queue iteration command for Disque, able to enlist al the existing queues names, and providing filter capabilities to only enlist a subset of the existing queues. The command is non blocking, and is very similar to Redis SCAN
commands, providing a cursor based non blocking iterator that guarantees to return all the elements but may emit duplicates. Contrary to Redis a blocking option is provided for blocking, unique elements output semantics, because there are a great deal of use cases where the number of queues is small.
The command returns an iterator and a list of elements. We could set the default COUNT
to a large value like ~100, in order for most use cases dealing with a few queues to get all the queue names with just a single QSCAN
invocation, without any argument at all.
Options:
- BLOCKING Return all the elements without duplicates in a blocking way. Dangerous.
- MINLEN
<count>
Only return queues having at least (>=) count enqueued jobs. - MAXLEN
<count>
Only return queues having at most (<=) count enqueued jobs. - IMPORTRATE
<msgsec>
Only return queues importing at least msgsec messages per second from other nodes. If zero is specified all the queues importing a non-zero number of messages are returned. - COUNT
<count>
Perform count work at every iteration. Does not always map to the actual number of returned items like in Redis.
Please give some feedback. It's pretty trivial to implement but I don't want to write code straight away.
What do you think about removing the BLOCKING
option and having a separate DEBUG QUEUES
command? As you say, the blocking option is dangerous, can degrade the node's performance, and should only be used for debugging (similar to FLUSHALL
). Also, I don't think "blocking" is the best way to refer to this property: GETJOB
is blocking, but doesn't mean that it's dangerous and can degrade overall performance (similar to all the B*
operations in Redis).
We can change name, for example BUSYLOOP
can be a better option, but I believe the option is better added in the context of the command for the following arguments:
- The users is clearly specifying what he/she wants. No risk of "Hey but I was not aware that BUSYLOOP was going to block the server!".
- Having it into this context makes the user able to use all the other
QSCAN
options like filters. - Queues are not exactly keys, many many users will have just a handful of queues, so if you know what you are doing, calling
QSCAN BUSYLOOP
can be an actual sensible option outside the debugging scope.
However I believe BUSYLOOP
is better than BLOCKING
because of your reasoning... Blocking means something else in Disque / Redis context. Makes sense?
Makes sense!
In any case, if you're writing a tool for Disque you'll have to err on the safe side and use the iterator. Client libraries will probably offer a utility method to iterate automatically and the built-in disque
client could probably do the same. So BUSYLOOP
(or FORCEALL
or whatever) will be reserved for very specific use cases.
Yep I agree BUSYLOOP
is basically limited to just two cases:
- I've few queues and I don't want to de-dup since this is a freaking shell script or alike.
- I'm an human running a command to debug stuff.
Ok now we have an implementation of it as specified in this issue, please check the qscan
branch. Playing with it, looks like it performs as expected, I've to write the tests still...
Can I be the devil's advocate and say even with those two use cases I don't think BUSYLOOP
is needed at all? As you said @antirez it has a very specific subset of people using it, but I feel client libraries may missuse it to get around a simpler implementation of a full-loop of all queues. Under that scenario I think it'd do more harm than good, and I'm all for removing things people will hardly use, and when they do most of them are using it for the wrong reasons.
@mariano currently what we did is to provide as a default interface the sane one, as an alternative interface a broken one but with a decent use case. If people will decide to shoot themselves in the feet, at this point they can, but they'll very likely know what they are doing... However not having such an interface makes me worring that 99% of people having 10 or 20 queues or alike, will not have a trivial way to write a shell script or something like that. So I agree with you that the infinite lazyness of programmers makes us at risk of misuse, but at the same time we can actively write in the doc what we think about this, and the name itself is pretty self-describing... anyway currently other misuses are possible. For example COUNT
is unbound, and I think there is more room for misuse here actually... maybe we should set a max value which is automatically used when it is great than that.
Merged and documented into README.