CNI tungsten fabric not work correctly

Question

CNI tungsten fabric not work correctly

noobcoderT opened this issue 4 years ago · 3 comments

Hi Tnaganawa,

I'm Zhijun Tang, an SDN developer.

I followed your instructions about installing TF as k8s cni through a single cni-tungsten-fabric.yaml file.
Now all TF pods are in running status, and patches are made to let coredns work well.

My k8s version is v1.12.3 and TF version is R5.1.

K8S pods and TF status are as follows,

[root@tf-k8s-master ~]# kubectl get node
NAME                      STATUS     ROLES    AGE     VERSION
tf-k8s-master.novalocal   NotReady   master   3h17m   v1.12.3
tf-k8s-node1.novalocal    Ready      <none>   3h14m   v1.12.3
tf-k8s-node2.novalocal    Ready      <none>   3h14m   v1.12.3

[root@tf-k8s-master ~]# kubectl get pod --all-namespaces
NAMESPACE     NAME                                              READY   STATUS    RESTARTS   AGE
default       deploy-test-toolbox-default-6c5655bd8-7n4nf       1/1     Running   0          168m
default       deploy-test-toolbox-default-6c5655bd8-rjc7g       1/1     Running   0          168m
kube-system   config-zookeeper-ps5j5                            1/1     Running   0          177m
kube-system   contrail-agent-44lcf                              2/2     Running   1          177m
kube-system   contrail-agent-vf4rf                              2/2     Running   1          177m
kube-system   contrail-analytics-bvnlq                          3/3     Running   0          177m
kube-system   contrail-config-database-nodemgr-ntcp6            1/1     Running   0          177m
kube-system   contrail-configdb-drx99                           1/1     Running   0          177m
kube-system   contrail-controller-config-wxnvl                  5/5     Running   0          177m
kube-system   contrail-controller-control-k7smg                 4/4     Running   0          177m
kube-system   contrail-controller-webui-kd6k6                   2/2     Running   0          177m
kube-system   contrail-kube-manager-7cscj                       1/1     Running   0          177m
kube-system   coredns-755db95bd6-fmk7h                          1/1     Running   0          171m
kube-system   coredns-755db95bd6-tv459                          1/1     Running   0          171m
kube-system   etcd-tf-k8s-master.novalocal                      1/1     Running   0          171m
kube-system   kube-apiserver-tf-k8s-master.novalocal            1/1     Running   0          171m
kube-system   kube-controller-manager-tf-k8s-master.novalocal   1/1     Running   0          171m
kube-system   kube-proxy-7nhrs                                  1/1     Running   0          152m
kube-system   kube-proxy-nrvnw                                  1/1     Running   0          152m
kube-system   kube-proxy-trp9c                                  1/1     Running   0          152m
kube-system   kube-scheduler-tf-k8s-master.novalocal            1/1     Running   0          171m
kube-system   kubernetes-dashboard-76456c6d4b-xdbhh             1/1     Running   0          3h1m
kube-system   rabbitmq-knvh8                                    1/1     Running   0          177m
kube-system   redis-8p5zh                                       1/1     Running   0          177m

[root@tf-k8s-master ~]# contrail-status
Pod              Service         Original Name                          State    Id            Status
                 redis           contrail-external-redis                running  15018a251b6e  Up 3 hours
analytics        api             contrail-analytics-api                 running  f331fd599d70  Up 3 hours
analytics        collector       contrail-analytics-collector           running  9bd0631d7382  Up 3 hours
analytics        nodemgr         contrail-nodemgr                       running  ebd6c53d7348  Up 3 hours
config           api             contrail-controller-config-api         running  3334220df3d9  Up 3 hours
config           device-manager  contrail-controller-config-devicemgr   running  4aeb4235717f  Up 3 hours
config           nodemgr         contrail-nodemgr                       running  b211f459f671  Up 3 hours
config           schema          contrail-controller-config-schema      running  1467b5c49830  Up 3 hours
config           svc-monitor     contrail-controller-config-svcmonitor  running  c775851eead3  Up 3 hours
config-database  cassandra       contrail-external-cassandra            running  fa6f3ea3baad  Up 3 hours
config-database  nodemgr         contrail-nodemgr                       running  94b3151d960e  Up 3 hours
config-database  rabbitmq        contrail-external-rabbitmq             running  60cb732a485d  Up 3 hours
config-database  zookeeper       contrail-external-zookeeper            running  ab5a1983d976  Up 3 hours
control          control         contrail-controller-control-control    running  7f8df6f6010c  Up 3 hours
control          dns             contrail-controller-control-dns        running  ed8aa660dba4  Up 3 hours
control          named           contrail-controller-control-named      running  10156ec12c86  Up 3 hours
control          nodemgr         contrail-nodemgr                       running  707bf7b43923  Up 3 hours
kubernetes       kube-manager    contrail-kubernetes-kube-manager       running  00a17ce46d6a  Up 3 hours
webui            job             contrail-controller-webui-job          running  aa37f4c1081c  Up 3 hours
webui            web             contrail-controller-webui-web          running  a10207defab7  Up 3 hours

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail kubernetes ==
kube-manager: active

== Contrail analytics ==
nodemgr: active
api: active
collector: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: active
api: active
schema: active

[root@tf-k8s-node1 ~]# contrail-status
Pod      Service  Original Name           State    Id            Status
vrouter  agent    contrail-vrouter-agent  running  1cd7d8f25324  Up 3 hours
vrouter  nodemgr  contrail-nodemgr        running  2e04d65f9e23  Up 3 hours

vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active

Though all things is in right state, I cannot ping pod IP from any workload node. And the nodeport service cannot be access from nodeip:nodeport.

[root@tf-k8s-master ~]# kubectl get pod -o wide
NAME                                          READY   STATUS    RESTARTS   AGE    IP              NODE                     NOMINATED NODE
deploy-test-toolbox-default-6c5655bd8-7n4nf   1/1     Running   0          3h8m   10.47.255.249   tf-k8s-node2.novalocal   <none>
deploy-test-toolbox-default-6c5655bd8-rjc7g   1/1     Running   0          3h8m   10.47.255.250   tf-k8s-node1.novalocal   <none>
[root@tf-k8s-master ~]# kubectl get svc -o wide
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE     SELECTOR
kubernetes                 ClusterIP   10.96.0.1       <none>        443/TCP        3h21m   <none>
svc-test-toolbox-default   NodePort    10.98.238.254   <none>        80:30036/TCP   3h8m    app=pod-test-toolbox-default

[root@tf-k8s-node1 ~]# ping 10.47.255.250 -c4
PING 10.47.255.250 (10.47.255.250) 56(84) bytes of data.

--- 10.47.255.250 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

[root@tf-k8s-node1 ~]# ping 10.47.255.249 -c4
PING 10.47.255.249 (10.47.255.249) 56(84) bytes of data.

--- 10.47.255.249 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

[root@tf-k8s-node1 ~]# netstat -lptn|grep 30036
tcp6       0      0 :::30036                :::*                    LISTEN      24693/kube-proxy

[root@tf-k8s-node1 ~]# curl --connect-timeout 10 http://127.0.0.1:30036/ -v
* About to connect() to 127.0.0.1 port 30036 (#0)
*   Trying 127.0.0.1...
* Connection timed out after 10001 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 10001 milliseconds
[root@tf-k8s-node1 ~]# curl --connect-timeout 10 http://10.98.238.254/ -v
* About to connect() to 10.98.238.254 port 80 (#0)
*   Trying 10.98.238.254...
* Connection timed out after 10001 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 10001 milliseconds

So what's the problem that TF can't work correctly when started up with single cni yaml file? Do you have any ided how can I debug this issue?

Thanks,
Zhijun

Answer 1 · 2020-10-17T00:18:13.000Z

Hi, Zhijun.
Thanks for trying that document :)

One thing,
Is it possible to update tungstenfabric:latest from tungstenfaric:r5.1?

Technically, that ping should work since network-policy is automatically created between default-domain:ip-fabric:ip-fabric and default-domain:k8s-default:default-k8s-pod-network, but as I remember, this feature didn't work well around r5.1 ..

Answer 2 · 2020-10-19T02:58:58.000Z

Hi, Zhijun.
Thanks for trying that document :)

One thing,
Is it possible to update tungstenfabric:latest from tungstenfaric:r5.1?

Technically, that ping should work since network-policy is automatically created between default-domain:ip-fabric:ip-fabric and default-domain:k8s-default:default-k8s-pod-network, but as I remember, this feature didn't work well around r5.1 ..

Hi, Tnaganawa,
Thanks for your reply.

Currently, I only get the R5.1 version. It's difficult to pull latest image.

I noticed that in the yaml file, CLOUD_ORCHESTRATOR was specified as none. I tried to modify this with "kubernetes", and it worked. Then I checked the scritps of contrail-container-builder. In vrouter/agent/entrypoint.sh, if the CLOUD_ORCHESTRATOR is "kubernetes", an static route "ip route add $pod_cidr via $VROUTER_GATEWAY dev vhost0" is added to all workload node. So I can access all pod IPs from workload nodes.

Answer 3 · 2021-02-20T15:27:37.000Z

I had the similar issue.
I followed this and I am kinda able to ping all IPs from the node1

https://translate.google.com/translate?hl=en&sl=zh-CN&u=https://tungstenfabric.org.cn/post/65&prev=search&pto=aue