CGCL-codes/pFedSD

RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.28.0.12]:14004

Opened this issue · 0 comments

Thanks for your paper,firstly.The pFedSD is a great case for FKD.When I run your code for pFedSD, it always show erros about process communication such as "Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/content/pFedSD/run_gloo.py", line 82, in main
process.run()
File "/content/pFedSD/pcode/workers/worker_pFedSD.py", line 47, in run
self._send_model_to_master()
File "/content/pFedSD/pcode/workers/worker_base.py", line 304, in _send_model_to_master
dist.send(tensor=flatten_model.buffer, dst=0)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1295, in send
default_pg.send([tensor], dst, tag).wait()
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.28.0.12]:43185".
I will appreciate it if you can give me some tips about this error. Thanks.