kubeedge/sedna

edge node connot get cloud result during Joint-Inference helmet task

talenterj opened this issue · 4 comments

I deploy Joint-Inference helmet task as docs, both cloud and edge worker are created.
after add video to edge node, I can only get result in /data/hard_example_edge_inference_output and /data/output, the /data/hard_example_cloud_inference_output dir is empty.

check the edge-worker logs

[2022-07-21 06:20:08,765] client.py(62) [WARNING] - Connection refused in request http://helmet-detection-inference-example-cloud.default:5000/sedna/predict
[2022-07-21 06:20:12,115] client.py(62) [WARNING] - Connection refused in request http://helmet-detection-inference-example-cloud.default:5000/sedna/predict
[2022-07-21 06:20:15,482] client.py(62) [WARNING] - Connection refused in request http://helmet-detection-inference-example-cloud.default:5000/sedna/predict
[2022-07-21 06:20:18,860] client.py(62) [WARNING] - Connection refused in request http://helmet-detection-inference-example-cloud.default:5000/sedna/predict
[2022-07-21 06:20:22,240] client.py(62) [WARNING] - Connection refused in request http://helmet-detection-inference-example-cloud.default:5000/sedna/predict
[2022-07-21 06:20:22,240] joint_inference.py(262) [ERROR] - get cloud result error: RetryError[<Future at 0x7f7451ae58d0 state=finished returned NoneType>]

what may be the reason?

edgemesh logs report no err

I test this on another edgenode and the problem still exists, edge node cannot access cloud 5000 port

finally solved according to edgemesh issuses, the dns policy need to specify.

How to fix your problem:

CLOUD_NODE="cloud-node-name"
EDGE_NODE="edge-node-name"

kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: JointInferenceService
metadata:
name: helmet-detection-inference-example
namespace: default
spec:
edgeWorker:
model:
name: "helmet-detection-inference-little-model"
hardExampleMining:
name: "IBT"
parameters:
- key: "threshold_img"
value: "0.9"
- key: "threshold_box"
value: "0.9"
template:
spec:
nodeName: $EDGE_NODE
dnsPolicy: ClusterFirstWithHostNet <----------- LOOK AT HERE!!!
containers:
- image: kubeedge/sedna-example-joint-inference-helmet-detection-little:v0.3.0
imagePullPolicy: IfNotPresent
name: little-model
env: # user defined environments
- name: input_shape
value: "416,736"
- name: "video_url"
value: "rtsp://localhost/video"
- name: "all_examples_inference_output"
value: "/data/output"
- name: "hard_example_cloud_inference_output"
value: "/data/hard_example_cloud_inference_output"
- name: "hard_example_edge_inference_output"
value: "/data/hard_example_edge_inference_output"
resources: # user defined resources
requests:
memory: 64M
cpu: 100m
limits:
memory: 2Gi
volumeMounts:
- name: outputdir
mountPath: /data/
volumes: # user defined volumes
- name: outputdir
hostPath:
# user must create the directory in host
path: /joint_inference/output
type: Directory

cloudWorker:
model:
name: "helmet-detection-inference-big-model"
template:
spec:
nodeName: $CLOUD_NODE
containers:
- image: kubeedge/sedna-example-joint-inference-helmet-detection-big:v0.3.0
name: big-model
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "input_shape"
value: "544,544"
resources: # user defined resources
requests:
memory: 2Gi
EOF
This is a k8s knowledge, if your pod is a host network, you need to configure dnsPolicy like dnsPolicy: ClusterFirstWithHostNet . You can get more information from the kubernetes docs: https://kubernetes.io/zh/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy