fnserver container has no access to internal cluster network - cannot ping mysql database
christiancadieux opened this issue · 0 comments
Description
fnserver container has no access to internal cluster network - cannot ping mysql database
Steps to reproduce the issue:
- start fn in kubernetes 1.19.15 bare-metal cluster on flatcar 5.4
Describe the results you received:
WARNINGS WHEN FNSERVER STARTS:
time="2022-03-13T19:08:10.893860840Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2022-03-13T19:08:10.895399404Z" level=info msg="libcontainerd: started new docker-containerd process" pid=36
time="2022-03-13T19:08:10Z" level=info msg="starting containerd" module=containerd revision=89623f28b87a6004d4b785663257362d1658a729 version=v1.0.0
time="2022-03-13T19:08:10Z" level=info msg="setting subreaper..." module=containerd
time="2022-03-13T19:08:10Z" level=info msg="changing OOM score to -500" module=containerd
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." module=containerd type=io.containerd.content.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." module=containerd type=io.containerd.snapshotter.v1
time="2022-03-13T19:08:10Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." module=containerd type=io.containerd.snapshotter.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." module=containerd type=io.containerd.metadata.v1
time="2022-03-13T19:08:10Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt"
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." module=containerd type=io.containerd.differ.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." module=containerd type=io.containerd.gc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." module=containerd type=io.containerd.monitor.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." module=containerd type=io.containerd.runtime.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1
time="2022-03-13T19:08:10Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug"
time="2022-03-13T19:08:10Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc"
time="2022-03-13T19:08:10Z" level=info msg="containerd successfully booted in 0.076981s" module=containerd
time="2022-03-13T19:08:11.091836012Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2022-03-13T19:08:11.092238813Z" level=warning msg="Your kernel does not support cgroup blkio weight"
time="2022-03-13T19:08:11.092270582Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
time="2022-03-13T19:08:11.093089636Z" level=info msg="Loading containers: start."
time="2022-03-13T19:08:11.098728143Z" level=warning msg="Running modprobe nf_nat failed with message: `ip: can't find device 'nf_nat'\nnf_nat 45056 4 ip6table_nat,xt_nat,xt_MASQUERADE,iptable_nat\nnf_conntrack 135168 7 xt_CT,nf_conntrack_netlink,xt_nat,xt_MASQUERADE,xt_conntrack,ip_vs,nf_nat\nlibcrc32c 16384 3 ip_vs,nf_nat,nf_conntrack\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2022-03-13T19:08:11.103532819Z" level=warning msg="Running modprobe xt_conntrack failed with message: `ip: can't find device 'xt_conntrack'\nxt_conntrack 16384 301 \nnf_conntrack 135168 7 xt_CT,nf_conntrack_netlink,xt_nat,xt_MASQUERADE,xt_conntrack,ip_vs,nf_nat\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2022-03-13T19:08:11.320902701Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2022-03-13T19:08:11.410758897Z" level=info msg="Loading containers: done."
time="2022-03-13T19:08:11.490784913Z" level=info msg="Docker daemon" commit=c97c6d6 graphdriver(s)=overlay2 version=17.12.0-ce
time="2022-03-13T19:08:11.491124348Z" level=info msg="Daemon has completed initialization"
time="2022-03-13T19:08:11.505286843Z" level=info msg="API listen on [::]:2375"
time="2022-03-13T19:08:11.505392148Z" level=info msg="API listen on /var/run/docker.sock"
time="2022-03-13T19:08:13Z" level=info msg="Setting log level to" fields.level=DEBUG
time="2022-03-13T19:08:13Z" level=info msg="Registering data store provider 'sql'"
server NewFromEnv
server.New
time="2022-03-13T19:08:13Z" level=info msg="using LB Base URL: 'http://rdeifn.lb.fn.internal:90'"
time="2022-03-13T19:08:13Z" level=debug msg="creating new datastore" db=mysql
time="2022-03-13T19:08:13Z" level=info msg="Connecting to DB" url="mysql://fnapp:boomsauce@tcp(rdeifn-mysql:3306)/fndb"
*** HANGS HERE - CANNOT PING DATABASE
kernel
/app # uname -a
Linux rdeifn-fn-847478b4bc-76cgh 5.4.77-flatcar #1 SMP Wed Nov 18 17:29:43 -00 2020 x86_64 Linux
Describe the results you expected:
need to connect to mysql db. can connect to mysql db from other pods in same namespace.
Additional information you deem important (e.g. issue happens only occasionally):
Output of fn version
(CLI command):
Client version is latest version: 0.6.17
Server version: ? <<< fnserver is not ready.
Additional environment details (OSX, Linux, flags, etc.):
$ k logs -f rdeifn-fn-847478b4bc-76cgh -c runner-lb
/usr/local/bin/preentry.sh: set: line 14: can't access tty; job control turned off
time="2022-03-13T18:46:17.094678038Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
listen tcp 0.0.0.0:2375: bind: address already in use
time="2022-03-13T18:46:20Z" level=info msg="Setting log level to" fields.level=INFO
time="2022-03-13T18:46:20Z" level=info msg="Registering data store provider 'sql'"
server NewFromEnv
server.New
time="2022-03-13T18:46:20Z" level=info msg="Starting static runner pool" runners="[rdeifn-fn-runner.cadieux2.svc.cluster.local:9191]"
time="2022-03-13T18:46:20Z" level=info msg="Connected to runner" runner_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191"
time="2022-03-13T18:46:20Z" level=info msg="Creating new naive runnerpool placer with config=&{RetryAllDelay:10ms PlacerTimeout:6m0s DetachedPlacerTimeout:30s}"
time="2022-03-13T18:46:20Z" level=info msg="lb-agent starting cfg={MinDockerVersion:17.10.0-ce ContainerLabelTag: DockerNetworks: DockerLoadFile: DisableUnprivilegedContainers:false FreezeIdle:50ms HotPoll:200ms HotLauncherTimeout:1h0m0s HotPullTimeout:10m0s HotStartTimeout:5s DetachedHeadRoom:6m0s MaxResponseSize:0 MaxHdrResponseSize:0 MaxLogSize:1048576 MaxTotalCPU:0 MaxTotalMemory:0 MaxFsSize:0 MaxPIDs:50 MaxOpenFiles:0xc4201d4c00 MaxLockedMemory:0xc4201d4c08 MaxPendingSignals:0xc4201d4c10 MaxMessageQueue:0xc4201d4c18 PreForkPoolSize:0 PreForkImage:busybox PreForkCmd:tail -f /dev/null PreForkUseOnce:0 PreForkNetworks: EnableNBResourceTracker:false MaxTmpFsInodes:0 DisableReadOnlyRootFs:false DisableDebugUserLogs:false IOFSEnableTmpfs:false EnableFDKDebugInfo:false IOFSAgentPath: IOFSMountRoot: IOFSOpts: ImageCleanMaxSize:0 ImageCleanExemptTags: ImageEnableVolume:false}"
server.New completed
funcServer.Start
server.Start
time="2022-03-13T18:46:20Z" level=info msg="\n ______\n / ____/___\n / /_ / __ \\\n / __/ / / / /\n /_/ /_/ /_/\n"
time="2022-03-13T18:46:20Z" level=info msg="Fn serving on `:90`" type=lb version=0.3.749
time="2022-03-13T18:46:25Z" level=warning msg="Created insecure grpc connection" grpc_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191" runner_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191"
All other pods work:
NAME READY STATUS RESTARTS AGE
ingress-nginx-controller-857c6b8d6c-vbtbj 1/1 Running 0 131m
r-dep-nginx-1-7ff774bfb5-mr76f 1/1 Running 0 19h
r-sts-tools-0 1/1 Running 0 12h
rdeifn-fn-847478b4bc-76cgh 1/2 Running 1 34m
rdeifn-fn-flow-depl-67db6765bc-cql2j 1/1 Running 0 12h
rdeifn-fn-runner-5cc448f875-fgbgv 1/1 Running 0 13h
rdeifn-fn-runner-5cc448f875-lb7b5 1/1 Running 0 13h
rdeifn-fn-runner-5cc448f875-q5dvs 1/1 Running 0 13h
rdeifn-fn-ui-7777796869-n76wz 1/1 Running 0 13h
rdeifn-mysql-765fb6dc7-8vlh9 1/1 Running 0 13h
rdeifn-redis-57fd48cf5b-zv5wz 1/1 Running 0 13h
tried the IPNO of the mysql service directly - also hang.
networking
/app # ifconfig -a
docker0 Link encap:Ethernet HWaddr 02:42:7C:9E:E8:76
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr A6:B9:7B:48:D6:E2
inet addr:192.168.92.200 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:876 errors:0 dropped:0 overruns:0 frame:0
TX packets:996 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:72298 (70.6 KiB) TX bytes:60142 (58.7 KiB)
ip6tnl0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
NOARP MTU:1452 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
tunl0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-9A-19-00-00-00-00-00-00-00-00
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
kube spec:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: rdeifn-fn
chart: fn-0.1.0
heritage: Helm
iproject: oracle-fn
release: rdeifn
name: rdeifn-fn
namespace: namespace-test
spec:
replicas: 1
selector:
matchLabels:
app: rdeifn-fn
chart: fn-0.1.0
heritage: Helm
iproject: oracle-fn
release: rdeifn
role: fn-service
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: rdeifn-fn
chart: fn-0.1.0
heritage: Helm
iproject: oracle-fn
release: rdeifn
role: fn-service
spec:
affinity: {}
containers:
- env:
- name: FN_DB_PASSWD
valueFrom:
secretKeyRef:
key: mysql-password
name: rdeifn-mysql
- name: FN_DB_HOST
value: rdeifn-mysql
- name: FN_MQ_HOST
value: rdeifn-redis
- name: FN_PORT
value: "80"
- name: FN_NODE_TYPE
value: api
- name: FN_PUBLIC_LB_URL
value: http://rdeifn.lb.fn.internal:90
- name: FN_DB_URL
value: mysql://fnapp:$(FN_DB_PASSWD)@tcp($(FN_DB_HOST):3306)/fndb
- name: FN_LOG_LEVEL
value: DEBUG
- name: FN_MQ_URL
value: redis://$(FN_MQ_HOST):6379/
image: hub.comcast.net/k8s-eng/rdei-ide/fnproject/fnserver:cc
imagePullPolicy: Always
name: api
ports:
- containerPort: 80
name: p80
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /v2/apps
port: 80
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 150m
memory: 512Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: FN_NODE_TYPE
value: lb
- name: FN_GRPC_PORT
value: "9191"
- name: FN_PORT
value: "90"
- name: FN_RUNNER_API_URL
value: http://rdeifn-fn.namespace-test.svc.cluster.local:80
- name: FN_RUNNER_ADDRESSES
value: rdeifn-fn-runner.namespace-test.svc.cluster.local:9191
- name: FN_LOG_LEVEL
value: INFO
image: hub.comcast.net/k8s-eng/rdei-ide/fnproject/fnserver:cc
imagePullPolicy: Always
name: runner-lb
ports:
- containerPort: 90
name: p90
protocol: TCP
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 150m
memory: 512Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
UPDATE
So I restarted the fnserver pod with hostNetwork=true, and that crashed the pod, but then I removed hostNetwork and restarted the pod again and now it works. It looks like running fnserver against the host network reconfigured something in the host that fixed the problem, not sure.
Anyway, in my case ,this problem, which I don't understand, is gone.
NAME READY STATUS RESTARTS AGE
ingress-nginx-controller-857c6b8d6c-vbtbj 1/1 Running 0 25h
r-sts-tools-0 1/1 Running 1 36h
rdeifn-fn-557c5bd749-9z2vf 2/2 Running 0 16h
rdeifn-fn-flow-depl-67db6765bc-cql2j 1/1 Running 0 35h
rdeifn-fn-runner-5cc448f875-fgbgv 1/1 Running 0 37h
rdeifn-fn-runner-5cc448f875-lb7b5 1/1 Running 0 37h
rdeifn-fn-runner-5cc448f875-q5dvs 1/1 Running 0 37h
rdeifn-fn-ui-7777796869-n76wz 1/1 Running 0 37h
rdeifn-mysql-765fb6dc7-8vlh9 1/1 Running 0 37h
rdeifn-redis-57fd48cf5b-zv5wz 1/1 Running 0 37h