bwlewis/doRedis

Remote workers

Opened this issue · 6 comments

Hello!
I launched some remote workers with script from manual on several machines. Then I started script with a lot of tasks (bootstrap example from manual). After completing about 10000 tasks remote workers stoped getting jobs. This happens all the time after some amount of tasks if I started remote workers. If i start local workers everyrhing is OK.
I am running redis under Windows. Redis version is 2.6.14. R version is 3.0.3. I run remote worker's script through RScript.exe.

Hi sorry for the delay, was on vacation! We're working through a bunch of issues with rredis and doRedis right now. There will be updates in github and sent to CRAN for both packages this weekend.

hifin commented

Hello. Here's another question related to remote workers. Right now if I want to run parallel jobs on several computers, I have to open an R session on each remote computer and run an R script containing the following two lines:
require(doRedis)
startLocalWorkers(n=number_of_local_workers, queue=job_queue_name, host=host_IP)

One way to avoid doing this every time is to keep using the same job queue name and avoid using removeQueue (so remote workers will keep alive). However, I'm not sure if this a good practice and if there is a better way to run parallel jobs

check out the scripts directory (at least if you're running Linux),
especially in GitHub ( a substantially revised/improved version is not yet
on cran).

there is a doredis service script for Linux, including a version that works
nicely on Amazon.

I'm in the slow process of revising the doc and making a tutorial about
that...
On Feb 18, 2016 2:26 AM, "hifin" notifications@github.com wrote:

Hello. Here's another question related to remote workers. Right now if I
want to run parallel jobs on several computers, I have to open an R session
on each remote computer and run an R script containing the following two
lines:
require(doRedis)
startLocalWorkers(n=number_of_local_workers, queue=job_queue_name,
host=host_IP)

One way to avoid doing this every time is to keep using the same job queue
name and avoid using removeQueue (so remote workers will keep alive).
However, I'm not sure if this a good practice and if there is a better way
to run parallel jobs


Reply to this email directly or view it on GitHub
#11 (comment).

hifin commented

Thank you! how about windows? i thought about this after posting the question here. using psexec tools and R's system() function may do the job but havent tried yet

Well, there is this ancient project that was used at Montefiore on Windows systems:

https://github.com/bwlewis/doRedisWindowsService

It has not been updated since 2011. But someday it should be updated to reflect the corresponding Linux service implementations...not sure if/when I can get around to that though, Windows is a system I use almost never.

I'm trying to do the same as hifin and Qwasser above on windows and wondered if either had settled on a preferred approach to starting remote workers and keeping them alive by not removing the queue?

Bryan, I wonder if you could advise whether I'm likely to run into any problems with
a) reusing a queue and
b) calling registerDoRedis with the same queue name having not removed it previously.

I'm currently using a combination of psexec and TaskScheduler on startup but I'm finding the connections to not be reliable enough to run in production at the moment, possibly due to the multiple registerDoRedis calls.