CNI tungsten fabric not work correctly
noobcoderT opened this issue · 3 comments
Hi Tnaganawa,
I'm Zhijun Tang, an SDN developer.
I followed your instructions about installing TF as k8s cni through a single cni-tungsten-fabric.yaml file.
Now all TF pods are in running status, and patches are made to let coredns work well.
My k8s version is v1.12.3 and TF version is R5.1.
K8S pods and TF status are as follows,
[root@tf-k8s-master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
tf-k8s-master.novalocal NotReady master 3h17m v1.12.3
tf-k8s-node1.novalocal Ready <none> 3h14m v1.12.3
tf-k8s-node2.novalocal Ready <none> 3h14m v1.12.3
[root@tf-k8s-master ~]# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default deploy-test-toolbox-default-6c5655bd8-7n4nf 1/1 Running 0 168m
default deploy-test-toolbox-default-6c5655bd8-rjc7g 1/1 Running 0 168m
kube-system config-zookeeper-ps5j5 1/1 Running 0 177m
kube-system contrail-agent-44lcf 2/2 Running 1 177m
kube-system contrail-agent-vf4rf 2/2 Running 1 177m
kube-system contrail-analytics-bvnlq 3/3 Running 0 177m
kube-system contrail-config-database-nodemgr-ntcp6 1/1 Running 0 177m
kube-system contrail-configdb-drx99 1/1 Running 0 177m
kube-system contrail-controller-config-wxnvl 5/5 Running 0 177m
kube-system contrail-controller-control-k7smg 4/4 Running 0 177m
kube-system contrail-controller-webui-kd6k6 2/2 Running 0 177m
kube-system contrail-kube-manager-7cscj 1/1 Running 0 177m
kube-system coredns-755db95bd6-fmk7h 1/1 Running 0 171m
kube-system coredns-755db95bd6-tv459 1/1 Running 0 171m
kube-system etcd-tf-k8s-master.novalocal 1/1 Running 0 171m
kube-system kube-apiserver-tf-k8s-master.novalocal 1/1 Running 0 171m
kube-system kube-controller-manager-tf-k8s-master.novalocal 1/1 Running 0 171m
kube-system kube-proxy-7nhrs 1/1 Running 0 152m
kube-system kube-proxy-nrvnw 1/1 Running 0 152m
kube-system kube-proxy-trp9c 1/1 Running 0 152m
kube-system kube-scheduler-tf-k8s-master.novalocal 1/1 Running 0 171m
kube-system kubernetes-dashboard-76456c6d4b-xdbhh 1/1 Running 0 3h1m
kube-system rabbitmq-knvh8 1/1 Running 0 177m
kube-system redis-8p5zh 1/1 Running 0 177m
[root@tf-k8s-master ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 15018a251b6e Up 3 hours
analytics api contrail-analytics-api running f331fd599d70 Up 3 hours
analytics collector contrail-analytics-collector running 9bd0631d7382 Up 3 hours
analytics nodemgr contrail-nodemgr running ebd6c53d7348 Up 3 hours
config api contrail-controller-config-api running 3334220df3d9 Up 3 hours
config device-manager contrail-controller-config-devicemgr running 4aeb4235717f Up 3 hours
config nodemgr contrail-nodemgr running b211f459f671 Up 3 hours
config schema contrail-controller-config-schema running 1467b5c49830 Up 3 hours
config svc-monitor contrail-controller-config-svcmonitor running c775851eead3 Up 3 hours
config-database cassandra contrail-external-cassandra running fa6f3ea3baad Up 3 hours
config-database nodemgr contrail-nodemgr running 94b3151d960e Up 3 hours
config-database rabbitmq contrail-external-rabbitmq running 60cb732a485d Up 3 hours
config-database zookeeper contrail-external-zookeeper running ab5a1983d976 Up 3 hours
control control contrail-controller-control-control running 7f8df6f6010c Up 3 hours
control dns contrail-controller-control-dns running ed8aa660dba4 Up 3 hours
control named contrail-controller-control-named running 10156ec12c86 Up 3 hours
control nodemgr contrail-nodemgr running 707bf7b43923 Up 3 hours
kubernetes kube-manager contrail-kubernetes-kube-manager running 00a17ce46d6a Up 3 hours
webui job contrail-controller-webui-job running aa37f4c1081c Up 3 hours
webui web contrail-controller-webui-web running a10207defab7 Up 3 hours
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail kubernetes ==
kube-manager: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: active
api: active
schema: active
[root@tf-k8s-node1 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running 1cd7d8f25324 Up 3 hours
vrouter nodemgr contrail-nodemgr running 2e04d65f9e23 Up 3 hours
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
Though all things is in right state, I cannot ping pod IP from any workload node. And the nodeport service cannot be access from nodeip:nodeport.
[root@tf-k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
deploy-test-toolbox-default-6c5655bd8-7n4nf 1/1 Running 0 3h8m 10.47.255.249 tf-k8s-node2.novalocal <none>
deploy-test-toolbox-default-6c5655bd8-rjc7g 1/1 Running 0 3h8m 10.47.255.250 tf-k8s-node1.novalocal <none>
[root@tf-k8s-master ~]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h21m <none>
svc-test-toolbox-default NodePort 10.98.238.254 <none> 80:30036/TCP 3h8m app=pod-test-toolbox-default
[root@tf-k8s-node1 ~]# ping 10.47.255.250 -c4
PING 10.47.255.250 (10.47.255.250) 56(84) bytes of data.
--- 10.47.255.250 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
[root@tf-k8s-node1 ~]# ping 10.47.255.249 -c4
PING 10.47.255.249 (10.47.255.249) 56(84) bytes of data.
--- 10.47.255.249 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
[root@tf-k8s-node1 ~]# netstat -lptn|grep 30036
tcp6 0 0 :::30036 :::* LISTEN 24693/kube-proxy
[root@tf-k8s-node1 ~]# curl --connect-timeout 10 http://127.0.0.1:30036/ -v
* About to connect() to 127.0.0.1 port 30036 (#0)
* Trying 127.0.0.1...
* Connection timed out after 10001 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 10001 milliseconds
[root@tf-k8s-node1 ~]# curl --connect-timeout 10 http://10.98.238.254/ -v
* About to connect() to 10.98.238.254 port 80 (#0)
* Trying 10.98.238.254...
* Connection timed out after 10001 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 10001 milliseconds
So what's the problem that TF can't work correctly when started up with single cni yaml file? Do you have any ided how can I debug this issue?
Thanks,
Zhijun
Hi, Zhijun.
Thanks for trying that document :)
One thing,
Is it possible to update tungstenfabric:latest from tungstenfaric:r5.1?
Technically, that ping should work since network-policy is automatically created between default-domain:ip-fabric:ip-fabric and default-domain:k8s-default:default-k8s-pod-network, but as I remember, this feature didn't work well around r5.1 ..
Hi, Zhijun.
Thanks for trying that document :)One thing,
Is it possible to update tungstenfabric:latest from tungstenfaric:r5.1?Technically, that ping should work since network-policy is automatically created between default-domain:ip-fabric:ip-fabric and default-domain:k8s-default:default-k8s-pod-network, but as I remember, this feature didn't work well around r5.1 ..
Hi, Tnaganawa,
Thanks for your reply.
Currently, I only get the R5.1 version. It's difficult to pull latest image.
I noticed that in the yaml file, CLOUD_ORCHESTRATOR was specified as none. I tried to modify this with "kubernetes", and it worked. Then I checked the scritps of contrail-container-builder. In vrouter/agent/entrypoint.sh, if the CLOUD_ORCHESTRATOR is "kubernetes", an static route "ip route add $pod_cidr via $VROUTER_GATEWAY dev vhost0" is added to all workload node. So I can access all pod IPs from workload nodes.
I had the similar issue.
I followed this and I am kinda able to ping all IPs from the node1