microsoft/DeepGNN

Connection reset by peer retry

MortezaRamezani opened this issue · 2 comments

  • Issue is labeled using the label menu on the right side.

Environment

  • Python version: Python 3.7.10
  • deepgnn-ge Version: 0.1.58.dev5
  • deepgnn-torch Version: N/A
  • deepgnn-tf Version: 0.1.58.dev5
  • OS: CentOS Linux release 7.9.2009, 3.10.0

Issue Details

  • Download a dataset(ppi)
    python -m deepgnn.graph_engine.data.ppi --data_dir /tmp/ppi
  • Create a client/server as follows:
# client.py
import json
import numpy as np
from deepgnn.graph_engine.snark.client import DistributedGraph

service_config_json = json.dumps({
                                    "methodConfig": [
                                        {
                                            "name": [{}],
                                            "waitForReady": True,
                                            "retryPolicy": {
                                                    "maxAttempts": 5,
                                                    "initialBackoff": "2s",
                                                    "maxBackoff": "1s",
                                                    "backoffMultiplier": 2,
                                                    "retryableStatusCodes": ["UNAVAILABLE", "ABORTED"],
                                                },

                                    }]
                                })
options = []
options.append(("grpc.enable_retries", 1))
options.append(("grpc.service_config", service_config_json))
g = DistributedGraph(["0.0.0.0:9090"], grpc_options=options)
for i in range(10000):
    f = g.node_features(np.array([0, 1, 13], dtype=np.int64), [[0, 100000000]], dtype=np.float32)
    print(f"#{i} features {f.shape}")
# server.py
from deepgnn.graph_engine.snark.server import Server
s = Server("/tmp/ppi", [0], "0.0.0.0:9090")
input("Server started..")
  • Use tcpkill to interrupt the connection
    sudo tcpkill -i lo -9 port 9090

  • Expected behavior

E00000000 00:00:00.000000  1473 client.cc:309] RAW: Request failed, code: 14. Message: recvmsg:Connection reset by peer
E00000000 00:00:00.000000  1431 py_graph.cc:385] RAW: Exception while fetching node features: Request failed. Message: recvmsg:Connection reset by peer
Traceback (most recent call last):
  File "client.py", line 27, in <module>
    f = g.node_features(np.array([0, 1, 13], dtype=np.int64), [[0, 100000000]], dtype=np.float32)
  File "/home/morameza/envs/deepgnn-os/lib/python3.7/site-packages/deepgnn/graph_engine/snark/client.py", line 460, in node_features
    c_size_t(result.nbytes),
  File "/home/morameza/envs/deepgnn-os/lib/python3.7/site-packages/deepgnn/graph_engine/snark/client.py", line 47, in __call__
    raise Exception(f"Failed to {self.method}")
Exception: Failed to extract node features

Thanks for reporting @MortezaRamezani! You mentioned offline a solution for this issue, could you share it as well?

Sure, I'm creating the PR for it.