Worker info not updated
teewhey opened this issue · 4 comments
Hi kamisama,
I ran into this scenario >>
- From the fresque logs, I see these entries :
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing - When I run "ps -ef", I don't use any worker process running.
- When I run "fresque stats", it lists down the previously listed workers.
- I tried to run "fresque stop", yet the worker info is still registered there.
Any idea how to solve this?
Found a workaround.
Just do a "fresque load", it will spawn workers and populate the list with the correct info.
1 - Did you try to pause then resume workers ?
2 - Logs is saying that the workers were exiting, thus stopping. That's why you don't see the workers' process
3 - Refer to 1, did you try to resume workers ?
4 - If you still see the workers in stats, then you should be able to stop them. Sometimes, it can takes time (some seconds). If they're still here, it means that the workers were not properly stopped.
From the symptoms, it seems the workers were not properly stopped.
What did you try to do when the logs says :
[2013-10-05T16:27:31+00:00] CONT received; resuming job processing
[2013-10-05T16:27:31+00:00] Exiting...
Was it a normal job execution, or did you try to stop/resume/pause workers ?
You workaround is working because load will start workers, and before starting new workers, resque is doing some cleaning to remove not properly stopped workers. Doing a start
or reset
should give the same results.
Just to clarify, the number 1,2,3,4 in my post above simply refers to chronological order. Not different scenarios.. ; )
- I did. From status I see it was paused and resumed. But even before I do that, I ran "ps -ef" and it returns no worker process. Probably was rebooted by forced power off.
- Correct. But when I run "fresque stats", resque stats returns me a list of workers, when it should have returned nothing since there is no worker process.
- Yeah I did
- I didn't do anything since I started the worker. I just left it running in my test server. Based on the timestamp (UTC), it happens on a Sunday, no one is in the office to work plus I'm the only one using this at the moment.
I did try to do "reset", "start", "stop" before I found the workaround. It all returns me the list of my previous workers while there is no shell process for the workers.
The server was rebooted. That could have caused the worker process to be gone and yet Redis is still registered with the old worker list....
Seems like we've found our culprit. Thank you for your help.