bwlewis/doRedis

Stop running workers after user interrupt

Closed this issue · 2 comments

When user interrupts the foreach code execution, the workers currently processing the tasks will keep running. Ideally, the running workers should stop. This is extremely useful in cases when tasks get a lot of time (like +1 hour in case of complex MCMC computations). This is an extension to issue #20, which has higher priority.

How to reproduce:

require(doRedis)
removeQueue('jobs')
redisFlushAll() # be carefull this removes everything on the redis server!!!
registerDoRedis('jobs', '127.0.0.1')

foreach (i = 1:4) %dopar% {
    for (j in 1:10) {
        cat("*")
        flush.console()
        Sys.sleep(2)
    }
    cat("\n")
}

Now stop the running foreach code and look - the workers will keep processing the current task.
(Tested on R 3.1.0, doRedis 1.1.1, rredis 1.6.9 and redis server 2.6.12, all on a single host Windows XP).

Expected behaviour: after some timeout, which should be settable by user, the worker should figure out that this computation has been stopped and is no longer relevant. This could perhaps be done in the setOK thread, maybe by checking existence of some of the redis variables related to this task queue (?). This would solve this issue if issue #20 is already solved.

Note: solving this issue would probably mean no manual job queue cleanup using removeQueue/registerDoRedis sequence is necessary in the code anymore (see question http://stackoverflow.com/q/25947991/684229), which would be a good workaround for serious issue #19 (but this issue should be solved anyway).

I'm sorry I've been away from this for a long time, I'll start working on your issues this week.

Again, sorry for the long latency.

I think what this actually requires is that a running task associated with the interrupt is somehow stopped on the workers (that is, the interrupt is propagated on a per-task basis).

Just stopping the workers is probably a bad idea; they won't be able to service other jobs.

Signal propagation is going to be pretty darn tricky in this framework because the workers are effectively anonymous. I have not been able to so far adequately figure this out without introducing a huge amount of machinery that seriously diminishes the simplicity of this doRedis.

If you've come up with ideas, please reopen send a pull request. We're very open to any good suggestions about this.