OKD 3.11 - Multiple services pointing to the same ready pod take ages to be accessible
ThinkBriK opened this issue · 5 comments
Description
On a multi-master install, creating a simple pod with multiple services pointing to it cause the services to appear very late compared to the moment the pod gets ready.
Version
Openshift 3.11
Steps To Reproduce
deploy the following resources:
# Dummy Server
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test-http
spec:
replicas: 1
selector:
matchLabels:
run: test-http
app: test
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
run: test-http
app: test
spec:
containers:
- args:
- httpd
- -fv
image: busybox
imagePullPolicy: Always
name: test-http
readinessProbe:
exec:
command:
- /bin/echo
- toto
failureThreshold: 3
# delayed readiness
initialDelaySeconds: 15
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
restartPolicy: Always
terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
run: test-http
app: test
name: test
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: test
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
run: test-http
app: test
name: test2
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: test
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
run: test-http
app: test
name: test3
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: test
type: ClusterIP
---
# Tester pod
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
labels:
run: test-http
app: tester
name: tester
spec:
replicas: 1
selector:
matchLabels:
run: test-http
app: tester
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
run: test-http
app: tester
spec:
containers:
- args:
- /bin/sh
- -c
- while true; do wget -t 5 test; sleep 1; wget -T 5 test2; sleep 1; wget -T 5 test3; sleep 5; done
image: busybox
imagePullPolicy: Always
name: tester
restartPolicy: Always
terminationGracePeriodSeconds: 0
Expected Results
Describe what you expected to happen.
tester logs should be full of 404 and test-http should record this
~ oc logs tester-6d5b5f5b45-bwpf8
Connecting to test (172.30.5.0:80)
wget: server returned error: HTTP/1.0 404 Not Found
Connecting to test2 (172.30.166.168:80)
wget: server returned error: HTTP/1.0 404 Not Found
Connecting to test3 (172.30.160.79:80)
wget: server returned error: HTTP/1.0 404 Not Found
Observed Results
Describe what is actually happening.
I'm getting No Route to host except for the first service initially, other services eventually gradually appear
Connecting to test (172.30.5.0:80)
wget: can't connect to remote host (172.30.5.0): No route to host
Connecting to test2 (172.30.166.168:80)
wget: server returned error: HTTP/1.0 404 Not Found
Connecting to test3 (172.30.160.79:80)
wget: can't connect to remote host (172.30.160.79): No route to host
Connecting to test (172.30.5.0:80)
wget: can't connect to remote host (172.30.5.0): No route to host
Connecting to test2 (172.30.166.168:80)
wget: server returned error: HTTP/1.0 404 Not Found
Connecting to test3 (172.30.160.79:80)
...
Additional Information
Provide any additional information which may help us diagnose the
issue.
- Your operating system and version: CentOS Linux release 7.7.1908 (Core)
EXTRA INFORMATION GOES HERE
It seems that IPtables rules are created only for a single service
# iptables -t nat -L | grep 'mynamespace/test'
KUBE-MARK-MASQ all -- ip-10-130-140-7.eu-west-3.compute.internal anywhere /* mynamespace/test2:http */
DNAT tcp -- anywhere anywhere /* mynamespace/test2:http */ tcp to:10.130.140.7:80
KUBE-MARK-MASQ tcp -- !ip-10-128-0-0.eu-west-3.compute.internal/14 test2.mynamespace.svc.cluster.local /* mynamespace/test2:http cluster IP */ tcp dpt:http
KUBE-SVC-4EDXJG65XWML2DSW tcp -- anywhere test2.mynamespace.svc.cluster.local /* mynamespace/test2:http cluster IP */ tcp dpt:http
KUBE-SEP-MPCOJF5FIFQHGSLF all -- anywhere anywhere /* mynamespace/test2:http */
Update:
Found out that endpoint did not manage to be created for non-working services !
Log of a working service:
{"log":"I1020 08:57:22.785503 84796 proxy.go:261] hybrid proxy: (always) update ep mynamespace/test3 in unidling proxy\n","stream":"stderr","time":"2020-10-20T08:57:22.78589115Z"}
{"log":"I1020 08:57:22.785508 84796 roundrobin.go:310] LoadBalancerRR: Setting endpoints for mynamespace/test3:http to [10.130.140.27:80]\n","stream":"stderr","time":"2020-10-20T08:57:22.785894097Z"}
{"log":"I1020 08:57:22.785513 84796 roundrobin.go:240] Delete endpoint 10.130.140.27:80 for service \"mynamespace/test3:http\"\n","stream":"stderr","time":"2020-10-20T08:57:22.785896856Z"}
{"log":"I1020 08:57:22.785519 84796 proxy.go:283] hybrid proxy: update ep mynamespace/test3 in main proxy\n","stream":"stderr","time":"2020-10-20T08:57:22.785899713Z"}
{"log":"I1020 08:57:22.785525 84796 endpoints.go:234] Setting endpoints for \"mynamespace/test3:http\" to []\n","stream":"stderr","time":"2020-10-20T08:57:22.785902388Z"}
{"log":"I1020 08:57:22.785535 84796 endpoints.go:234] Setting endpoints for \"mynamespace/test3:http\" to [10.130.140.27:80]\n","stream":"stderr","time":"2020-10-20T08:57:22.785905229Z"}
{"log":"-A KUBE-SERVICES -m comment --comment \"mynamespace/test3:http cluster IP\" -m tcp -p tcp -d 172.30.38.204/32 --dport 80 ! -s 10.128.0.0/14 -j KUBE-MARK-MASQ\n","stream":"stderr","time":"2020-10-20T08:57:22.81362588Z"}
{"log":"-A KUBE-SERVICES -m comment --comment \"mynamespace/test3:http cluster IP\" -m tcp -p tcp -d 172.30.38.204/32 --dport 80 -j KUBE-SVC-IKGWW46TKWZ3CTQH\n","stream":"stderr","time":"2020-10-20T08:57:22.813628775Z"}
{"log":"-A KUBE-SVC-IKGWW46TKWZ3CTQH -m comment --comment mynamespace/test3:http -j KUBE-SEP-Z35DPKKXT4TOKDBN\n","stream":"stderr","time":"2020-10-20T08:57:22.81363166Z"}
{"log":"-A KUBE-SEP-Z35DPKKXT4TOKDBN -m comment --comment mynamespace/test3:http -s 10.130.140.27/32 -j KUBE-MARK-MASQ\n","stream":"stderr","time":"2020-10-20T08:57:22.81363616Z"}
{"log":"-A KUBE-SEP-Z35DPKKXT4TOKDBN -m comment --comment mynamespace/test3:http -m tcp -p tcp -j DNAT --to-destination 10.130.140.27:80\n","stream":"stderr","time":"2020-10-20T08:57:22.813638985Z"}
{"log":"I1020 08:57:22.821700 84796 healthcheck.go:235] Not saving endpoints for unknown healthcheck \"mynamespace/test3\"\n","stream":"stderr","time":"2020-10-20T08:57:22.821765899Z"}
{"log":"I1020 08:57:29.425753 84796 serviceresolver.go:161] Answered test3.mynamespace.svc.cluster.local.:false with msg.Service{Host:\"172.30.38.204\", Port:0, Priority:10, Weight:10, Text:\"\", Mail:false, Ttl:0x1e, TargetStrip:0, Group:\"\", Key:\"/skydns/local/cluster/svc/mynamespace/test3/ec1415c3\"}\n","stream":"stderr","time":"2020-10-20T08:57:29.425804747Z"}
{"log":"I1020 08:57:29.426074 84796 serviceresolver.go:161] Answered test3.mynamespace.svc.cluster.local.:false with msg.Service{Host:\"172.30.38.204\", Port:0, Priority:10, Weight:10, Text:\"\", Mail:false, Ttl:0x1e, TargetStrip:0, Group:\"\", Key:\"/skydns/local/cluster/svc/mynamespace/test3/ec1415c3\"}\n","stream":"stderr","time":"2020-10-20T08:57:29.42610662Z"}
Log of a non-working service
{"log":"I1020 08:57:22.787821 84796 proxy.go:261] hybrid proxy: (always) update ep mynamespace/test2 in unidling proxy\n","stream":"stderr","time":"2020-10-20T08:57:22.78786746Z"}
{"log":"I1020 08:57:22.787834 84796 roundrobin.go:310] LoadBalancerRR: Setting endpoints for mynamespace/test2:http to [10.130.140.27:80]\n","stream":"stderr","time":"2020-10-20T08:57:22.78787164Z"}
{"log":"I1020 08:57:22.787846 84796 roundrobin.go:240] Delete endpoint 10.130.140.27:80 for service \"mynamespace/test2:http\"\n","stream":"stderr","time":"2020-10-20T08:57:22.787883266Z"}
{"log":"I1020 08:57:22.787866 84796 proxy.go:283] hybrid proxy: update ep mynamespace/test2 in main proxy\n","stream":"stderr","time":"2020-10-20T08:57:22.787889936Z"}
{"log":"I1020 08:57:22.787877 84796 endpoints.go:234] Setting endpoints for \"mynamespace/test2:http\" to []\n","stream":"stderr","time":"2020-10-20T08:57:22.78789457Z"}
{"log":"I1020 08:57:22.787893 84796 endpoints.go:234] Setting endpoints for \"mynamespace/test2:http\" to [10.130.140.27:80]\n","stream":"stderr","time":"2020-10-20T08:57:22.787929726Z"}
{"log":"-A KUBE-SERVICES -m comment --comment \"mynamespace/test2:http has no endpoints\" -m tcp -p tcp -d 172.30.120.81/32 --dport 80 -j REJECT\n","stream":"stderr","time":"2020-10-20T08:57:22.811561709Z"}
{"log":"I1020 08:57:25.417747 84796 serviceresolver.go:161] Answered test2.mynamespace.svc.cluster.local.:false with msg.Service{Host:\"172.30.120.81\", Port:0, Priority:10, Weight:10, Text:\"\", Mail:false, Ttl:0x1e, TargetStrip:0, Group:\"\", Key:\"/skydns/local/cluster/svc/mynamespace/test2/bd537d1a\"}\n","stream":"stderr","time":"2020-10-20T08:57:25.417810313Z"}
{"log":"I1020 08:57:25.418124 84796 serviceresolver.go:161] Answered test2.mynamespace.svc.cluster.local.:false with msg.Service{Host:\"172.30.120.81\", Port:0, Priority:10, Weight:10, Text:\"\", Mail:false, Ttl:0x1e, TargetStrip:0, Group:\"\", Key:\"/skydns/local/cluster/svc/mynamespace/test2/bd537d1a\"}\n","stream":"stderr","time":"2020-10-20T08:57:25.418163831Z"}
Endpoints are not reflected correctly during service creation via iptables rules (API objects do exist)
Is here any mean of retrieving kube-proxy logs ? (found it and updated first post)
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen
.
Mark the issue as fresh by commenting/remove-lifecycle rotten
.
Exclude this issue from closing again by commenting/lifecycle frozen
./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.