kubernetes/kube-proxy

kube-proxy failed to restore iptables

yongxiu opened this issue · 5 comments

kube-proxy version: kube-proxy-amd64:v1.22.8

kube-proxy failed to save iptables rules, the verbose logs are below:

I1225 03:04:45.320763       1 proxier.go:1355] "Opened local port" port="\"nodePort for gke-system/istio-ingress:status-port\" (:31937/tcp4)"
I1225 03:04:45.321514       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "gke-system/istiod:https-webhook cluster IP" -m tcp -p tcp -d 172.26.172.173/32 --dport 443 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.321712       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp -p tcp -d 172.26.75.7/32 --dport 443 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.321945       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "gke-system/istio-ingress:https cluster IP" -m tcp -p tcp -d 172.26.94.145/32 --dport 443 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.322172       1 proxier.go:1355] "Opened local port" port="\"nodePort for gke-system/istio-ingress:https\" (:30429/tcp4)"
I1225 03:04:45.322334       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "default/kubernetes:https cluster IP" -m tcp -p tcp -d 172.26.0.1/32 --dport 443 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.322489       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor cluster IP" -m tcp -p tcp -d 172.26.147.25/32 --dport 9402 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.322636       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "gke-system/istiod:http-monitoring cluster IP" -m tcp -p tcp -d 172.26.172.173/32 --dport 9093 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.322799       1 traffic.go:91] [DetectLocalByCIDR (10.240.0.0/13)] Jump Not Local: [-m comment --comment "anthos-identity-service/ais:info cluster IP" -m tcp -p tcp -d 172.26.118.2/32 --dport 9901 ! -s 10.240.0.0/13 -j KUBE-MARK-MASQ]
I1225 03:04:45.324067       1 proxier.go:1621] "Restoring iptables" rules=https://gist.github.com/yongxiu/81bfb3f8974e4aba29b4918bbccb3c82
I1225 03:04:45.324871       1 iptables.go:419] running iptables-restore [-w 5 -W 100000 --noflush --counters]
E1225 03:04:45.338528       1 proxier.go:1624] "Failed to execute iptables-restore" err="exit status 1 (iptables-restore: line 398 failed\n)"
I1225 03:04:45.339920       1 proxier.go:1627] "Closing local ports after iptables-restore failure"

iptable rules are in https://gist.github.com/yongxiu/81bfb3f8974e4aba29b4918bbccb3c82

I tried to manually run this rule iptables-restore -w 5 -W 100000 --noflush --counters < rule.txt --verbose

It succeeded only after I commented out 3 lines:

# iptables-restore -w 5 -W 100000 --noflush --counters < rule.txt  --verbose
# -X KUBE-SVC-G4P4IPQ4JUEESJSA
# -X KUBE-SVC-A32MGCDFPRQGQDBB
# -X KUBE-SVC-JTXKX5D7NT2O6RLC

I wonder how to debug the root cause?

https://gist.github.com/yongxiu/8e85219ef90d8e8b3042b96374b7a60e is the result of iptables-save , seems like if there is a existing rule with same name, it will fail

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.