Can't launch all celery workers/schedulers
Closed this issue · 15 comments
I'm trying to start all the workers/schedulers for the 2 orgs setup.
If I run the command:
DJANGO_SETTINGS_MODULE=backend.settings.dev BACKEND_ORG=owkin BACKEND_DEFAULT_PORT=8000 BACKEND_PEER_PORT_EXTERNAL=9051 celery -E -A backend worker -l info -B -n owkin -Q owkin,scheduler,celery --hostname owkin.scheduler
the logs get attached to my terminal and I can't run the commands to start the following instances. Thus I decided to pas the --detach
argument, which works, but when trying to lauch a new celery worker I get an error saying:
ERROR: Pidfile (celeryd.pid) already exists. Seems we're already running? (pid: 4815) Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/util.py", line 319, in _exit_function p.join() File "/usr/lib/python3.6/multiprocessing/process.py", line 122, in join assert self._parent_pid == os.getpid(), 'can only join a child process' AssertionError: can only join a child process
I could run the command with the --pidfile=
(with no path) flag, but is that the right way to go?
Thanks!
Hello @chrisalexandrepena,
Did you try with different terminals window?
Do you get the same error?
I tried running them in separate screen
sessions but I always get this error. What is strange is that when I input all the lines specified in the repository README for the firt time, without the --detach
flag:
- It displayed the logs of my first worker
- I stoped the logs with a simple ctrl+C
- I could see through celery flower that my worker was still up
- If I powered off my running worker, all of a sudden celery launched the second one
- If I powered off that new worker celery launched the 3rd one, and so on
I haven't been able to replicate that behaviour since
Quick note, you will maybe need to change DJANGO_SETTINGS_MODULE=backend.settings.dev
to DJANGO_SETTINGS_MODULE=backend.settings.celery.dev
, but this implies having a ledger running :)
So you still have the issue?
Are you on linux, mac, windows, everything else?
Maybe --pidfile
is the solution for your environment as described here: https://stackoverflow.com/questions/53521959/dockercelery-error-pidfile-celerybeat-pid-already-exists
But I don't think you need to use it at tall.
My ledger is running so that shouldn't be a problem :)
I've just tried it, but the DJANGO_SETTINGS_MODULE=backend.settings.celery.dev
still gives me the error. I had already tried adding a --pidfile=""
before, and it does work. Only I wasn't sure if that would cause any problems with the rest of the app?
I'm running the system on a linux virtuabox (ubuntu server 18.04).
It should not cause any problems with the rest of the app, but I suggest you to check all the celeries running on your machine.
Maybe you will need to kill some: ps aux | grep celery
If I try to launch all the tasks in a row, using --detach
and --pidfile=""
flags, and after killing all previous celery worker instances I get a strange error:
Traceback (most recent call last):
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/hfc/fabric/user.py", line 257, in _restore_state
enrollment = state_dict['enrollment']
KeyError: 'enrollment'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/dev/substra-backend/.venv/bin/celery", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/__main__.py", line 16, in main
_main()
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/celery.py", line 322, in main
cmd.execute_from_commandline(argv)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/celery.py", line 496, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/base.py", line 275, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/celery.py", line 488, in handle_argv
return self.execute(command, argv)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/celery.py", line 420, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/worker.py", line 221, in run_from_argv
*self.parse_options(prog_name, argv, command))
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/base.py", line 398, in parse_options
self.parser = self.create_parser(prog_name, command)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/base.py", line 414, in create_parser
self.add_arguments(parser)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/bin/worker.py", line 277, in add_arguments
default=conf.worker_state_db,
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/utils/collections.py", line 126, in __getattr__
return self[k]
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/utils/collections.py", line 429, in __getitem__
return getitem(k)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/utils/collections.py", line 278, in __getitem__
return mapping[_key]
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/collections/__init__.py", line 987, in __getitem__
if key in self.data:
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/app/base.py", line 141, in data
return self.callback()
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/app/base.py", line 924, in _finalize_pending_conf
conf = self._conf = self._load_config()
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/app/base.py", line 934, in _load_config
self.loader.config_from_object(self._config_source)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/loaders/base.py", line 131, in config_from_object
self._conf = force_mapping(obj)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/celery/utils/collections.py", line 46, in force_mapping
if isinstance(m, (LazyObject, LazySettings)):
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/django/utils/functional.py", line 213, in inner
self._setup()
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/django/conf/__init__.py", line 44, in _setup
self._wrapped = Settings(settings_module)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/django/conf/__init__.py", line 107, in __init__
mod = importlib.import_module(self.SETTINGS_MODULE)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/ubuntu/dev/substra-backend/backend/backend/settings/celery/dev.py", line 1, in <module>
from ..deps.ledger import *
File "/home/ubuntu/dev/substra-backend/backend/backend/settings/deps/ledger.py", line 32, in <module>
cert_path=LEDGER['client']['cert_path']
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/hfc/fabric/user.py", line 348, in create_user
user = User(name, org, state_store)
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/hfc/fabric/user.py", line 58, in __init__
self._restore_state()
File "/home/ubuntu/dev/substra-backend/.venv/lib/python3.6/site-packages/hfc/fabric/user.py", line 272, in _restore_state
raise IOError("Cannot deserialize the user", e)
OSError: [Errno Cannot deserialize the user] 'enrollment'
When I check after getting the error, a random number of the workers have been successfuly spawned. If I launch it again, I get the same error but can see new workers spawning. And if I remove all active workers again and relaunch my command I end up with the same error message and a different number of successfuly spawned workers...
Using the DJANGO_SETTINGS_MODULE=backend.settings.celery.dev
doesn't affect the outcome...:(
What tasks are you trying to launch?
Where does come from this error? From a celery worker?
Is a docker instance running a fabric ca
is running?
Can you show the result of:
$> docker ps -a
Thanks,
The command I'm trying to launch is:
DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=owkin BACKEND_DEFAULT_PORT=8000 BACKEND_PEER_PORT_EXTERNAL=9051 celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,scheduler,celery --hostname owkin.scheduler --pidfile="" && \
DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=owkin BACKEND_DEFAULT_PORT=8000 BACKEND_PEER_PORT_EXTERNAL=9051 celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,owkin.worker,celery --hostname owkin.worker --pidfile="" && \
DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=chu-nantes BACKEND_DEFAULT_PORT=8001 BACKEND_PEER_PORT_EXTERNAL=7051 celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,scheduler,celery --hostname chu-nantes.scheduler --pidfile="" && \
DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=chu-nantes BACKEND_DEFAULT_PORT=8001 BACKEND_PEER_PORT_EXTERNAL=7051 celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,chu-nantes.worker,celery --hostname chu-nantes.worker --pidfile="" && \
DJANGO_SETTINGS_MODULE=backend.settings.common celery --detach --pidfile="" -A backend beat -l info
I've just tested your commands and it seems to work:
$> DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=owkin BACKEND_DEFAULT_PORT=8000 BACKEND_PEER_PORT_EXTERNAL=9051 celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,scheduler,celery --hostname owkin.scheduler --pidfile="" && \
> DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=owkin BACKEND_DEFAULT_PORT=8000 BACKEND_PEER_PORT_EXTERNAL=9051 celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,owkin.worker,celery --hostname owkin.worker --pidfile="" && \
> DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=chu-nantes BACKEND_DEFAULT_PORT=8001 BACKEND_PEER_PORT_EXTERNAL=7051 celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,scheduler,celery --hostname chu-nantes.scheduler --pidfile="" && \
> DJANGO_SETTINGS_MODULE=backend.settings.celery.dev BACKEND_ORG=chu-nantes BACKEND_DEFAULT_PORT=8001 BACKEND_PEER_PORT_EXTERNAL=7051 celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,chu-nantes.worker,celery --hostname chu-nantes.worker --pidfile="" && \
> DJANGO_SETTINGS_MODULE=backend.settings.common celery --detach --pidfile="" -A backend beat -l info
$> ps aux | grep celery
guillau+ 30691 0.0 0.2 667912 70660 pts/10 Sl 16:55 0:00 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,scheduler,celery --hostname owkin.scheduler --pidfile=
guillau+ 30698 101 0.3 629560 102384 ? S 16:55 0:08 [celeryd: celery@owkin.scheduler:MainProcess] -active- (worker -l info -B -Q owkin,scheduler,celery -E -A backend --hostname=owkin.scheduler)
guillau+ 30807 0.0 0.2 667916 71052 pts/10 Sl 16:55 0:00 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n owkin -Q owkin,owkin.worker,celery --hostname owkin.worker --pidfile=
guillau+ 30813 97.1 0.1 147064 53300 ? R 16:55 0:06 /home/guillaume/.venv/substrabac/bin/python -m celery worker -l info -B -Q owkin,owkin.worker,celery -E -A backend --hostname=owkin.worker
guillau+ 30847 0.0 0.2 667904 70904 pts/10 Sl 16:55 0:00 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,scheduler,celery --hostname chu-nantes.scheduler --pidfile=
guillau+ 30855 108 0.2 537088 68528 ? R 16:55 0:05 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,scheduler,celery --hostname chu-nantes.scheduler --pidfile=
guillau+ 30890 0.0 0.2 667916 70952 pts/10 Sl 16:55 0:00 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,chu-nantes.worker,celery --hostname chu-nantes.worker --pidfile=
guillau+ 30896 100 0.2 537100 69128 ? R 16:55 0:04 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery -E --detach -A backend worker -l info -B -n chunantes -Q chu-nantes,chu-nantes.worker,celery --hostname chu-nantes.worker --pidfile=
guillau+ 30928 0.0 0.1 617796 58628 pts/10 Sl 16:56 0:00 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery --detach --pidfile= -A backend beat -l info
guillau+ 30937 132 0.1 487144 55244 ? R 16:56 0:02 /home/guillaume/.venv/substrabac/bin/python /home/guillaume/.venv/substrabac/bin/celery --detach --pidfile= -A backend beat -l info
guillau+ 30989 0.0 0.2 667928 71004 ? Sl 16:56 0:00 /home/guillaume/.venv/substrabac/bin/python -m celery worker -l info -B -Q owkin,scheduler,celery -E -A backend --hostname=owkin.scheduler
guillau+ 31007 0.0 0.2 628484 87104 ? R 16:56 0:00 [celeryd: celery@owkin.scheduler:MainProcess] -active- (worker -l info -B -Q owkin,scheduler,celery -E -A backend --hostname=owkin.scheduler)
guillau+ 31010 0.0 0.2 628740 87244 ? S 16:56 0:00 [celeryd: celery@owkin.scheduler:ForkPoolWorker-2]
guillau+ 31014 0.0 0.0 14784 1000 pts/10 S+ 16:56 0:00 grep --color=auto celery
All is running correctly.
I've also not being able to reproduce your error with the pidfile.
Maybe there is something wrong with your python virtualenv.
Can you give us the output of:
$> docker logs -f run-owkin
Of course:
external_orgs: ['chu-nantes']
Sign update proposal on chu-nantes ...
Send update proposal with org: chu-nantes...
Wait For Peers to join channel
Join channel substrachannel with peers ['peer1-owkin', 'peer2-owkin'] ...
Peers ['peer1-owkin', 'peer2-owkin'] successfully joined channel substrachannel
Installing chaincode on ['peer1-owkin', 'peer2-owkin'] ...
Installing chaincode on ['peer1-chu-nantes', 'peer2-chu-nantes'] ...
policy: OR('owkinMSP.member', 'chu-nantesMSP.member')
Upgraded chaincode with policy: {'identities': [{'role': {'name': 'member', 'mspId': 'owkinMSP'}}, {'role': {'name': 'member', 'mspId': 'chu-nantesMSP'}}], 'policy': {'1-of': [{'signed-by': 0}, {'signed-by': 1}]}} and result: "{'name': 'substracc', 'version': '2.0', 'escc': 'escc', 'vscc': 'vscc', 'policy': {'version': 0, 'rule': {'n_out_of': {'n': 1, 'rules': [{'signed_by': 0}, {'signed_by': 1}]}}, 'identities': [{'principal_classification': 'ROLE', 'principal': {'msp_identifier': 'owkinMSP', 'role': 'MEMBER'}}, {'principal_classification': 'ROLE', 'principal': {'msp_identifier': 'chu-nantesMSP', 'role': 'MEMBER'}}]}, 'data': {'hash': b'"\x0c% \xa2R\x81*\x89#\x18\x0fl\xaf8\xd9\x95{O\x8bN\xbc\xad\x15\xe8\x8b)\xebrz=8', 'metadatahash': b'\x0fz\x15\xa7\x95\x01*\xca\xe2\x88P \xae3\xf5\x07\xba\xee\xcd\xfb\xa85g\xa6\x7f\xafqW\x0f\x0f\xad\xc8'}, 'id': b'\xec\x06\xcf\xd7{\x16 \x13\x83H\x0b8\xe2J\xfa\xee\x91\\/\x1eLg\x85\x1d\xcdj\xdb_HO\x02\xf5', 'instantiation_policy': {'version': 0, 'rule': {'n_out_of': {'n': 1, 'rules': [{'signed_by': 0}, {'signed_by': 1}]}}, 'identities': [{'principal_classification': 'ROLE', 'principal': {'msp_identifier': 'chu-nantesMSP', 'role': 'ADMIN'}}, {'principal_classification': 'ROLE', 'principal': {'msp_identifier': 'owkinMSP', 'role': 'ADMIN'}}]}}"
Removing chaincode docker containers ...
1daa3b149306
570f179f8116
Try to query chaincode from peer ['peer1-owkin', 'peer2-owkin'] on org owkin
Queried chaincode, result: []
Congratulations! Ledger has been correctly initialized.
Everything is running great, I don't see how you could see these errors.
Did you try with a docker setup for substra-backend?
I had not tried the docker setup, it works indeed very well thanks :)
Can we close this issue then?
I guess you can :)
Thanks,