Worker "grabs" jobs for queuing locally, taking them from other gearman workers
paulmach opened this issue · 9 comments
This is a problem if you want to limit the amount of "functions" run concurrently, ie. if you want to run one job at a time.
What happens is the worker continuously accepts jobs from the server while only running running n at time (worker.New(n)
where n not equal to 0). The major problem with this is those jobs cannot be run by other workers, thus losing a lot of the power of gearman. Also pinging the server for job status will says all the locally queued jobs are running, when only n are (see example below).
Given this limitation, this code should not be used. I was unable to fix this in an elegant way as the nature of the implementation has a bunch of stuff going on concurrently and was unable to put in a nice fix to limit things.
I recommend simplifying everything so one worker has one connection to each gearman server and can only do one job at a time. If you need to run multiple functions at a time then create multiple workers
How to reproduce
- Have a a local gearman server running the this go package installed
- Run this code to queue up 100 background jobs https://gist.github.com/paulmach/5445876
- run
(echo status ; sleep 0.1) | nc 127.0.0.1 4730
on the console to see what is queued on the gearman server. You should see
test 100 0 0
.
This means 100 test jobs queued, 0 running, and 0 workers connected who can do this job.
- Run the test worker to process these jobs https://gist.github.com/paulmach/5445970
- This should run one job at a time and take a while to complete as each job is a 2 second sleep
- While the above is running, run
(echo status ; sleep 0.1) | nc 127.0.0.1 4730
in another terminal. You should see something like
test 97 19 1
.
This means 97 jobs are still waiting to complete, 19 are running and there is 1 worker working.
- THIS IS THE PROBLEM. the one testworker.go has 19 jobs assigned to it but is only doing one at a time. If another worker comes along, it will not get these 19 jobs. Also pinging the server for job status will say these 19 jobs are running when in fact only 1 is.
The major problem with this is those jobs cannot be run by other workers, thus losing a lot of the power of gearman.
It's really a bug. Thx!
-_-!
I recommend simplifying everything so one worker has one connection to each gearman server and can only do one job at a time. If you need to run multiple functions at a time then create multiple workers
This package was designed for giving users good experience.
It's always bad idea that give too many lower-details to users.
Let the GRAB limitation replace EXEC limitation may be a good way to try.
I will try to fix this issue today.
Hope it won't be too difficult. :)
Again, thank you for your help!
I'm still able to reproduce this using the steps above
Before this, a worker will grab more jobs than its capability.
This patch limit down the number of jobs.
A worker will grab two jobs at a time: one for executing; the other for waiting to put into queue.
It's still an issue. Reopened!
@mikespook Have you put any more thought into this issue? I can get it work if you just have one job per worker at a time. So you go from:
w := worker.New(10)
w.AddServer("127.0.0.1:4730")
w.AddFunc("ToUpper", ToUpper, worker.Immediately)
w.Work()
to
for i := 0; i < 10; i++ {
w := worker.New()
w.AddServer("127.0.0.1:4730");
w.AddFunc("ToUpper", ToUpper, worker.Immediately)
go w.Work()
}
This way there is one job running per connection to the server. It simplifies a lot of stuff.
Let me know what you think about this change and I can submit a pull request.
I think I am facing this issue, I have a job that launches another job and waits for it to complete.
Even if I create the worker with
w := worker.New(worker.OneByOne) // 1 job at a time
the worker still accepts the second job and queues it. Since the first job depends on the second to complete, it gets stuck.
Would it make sense for the worker launched with New(1) to accept only one job and wait for it to finish before accepting another one?
Is there any workaround to this?
I'm working on it. Hope it could be done before Xmas.
PS: Sorry for delay. A contract job has made me crazy.
"Executing Block" to "Grabbing block"
Known issue: ALL_YOURS is not supported
@mikespook How can we have multiple workers pick up the job and where do we need to update the Ip's of multple workers
@mikespook
in the below code i think we can specify but not sure whats that file name and where to update