addressblocks are not freed when scheduled on master nodes
tflabs-nl opened this issue · 10 comments
Describe the bug
I had a coredns pod that was being scheduled on a master node over and over, which failed due to CNI version incompatibility. Each restart resulted in a new addressblock reservation. Address blocks did not clear after each failed attemt and the full /16 is now used up, resulting in pods stuck in creating phase.
Environments
- Version: 20.04
- OS: Ubuntu
Expected behavior
Block is being cleared on pod finalization, even on master nodes.
Also manual deletion of an addressblock fails without errors in the coil-controller pods and coild pods
Is there a way I can force delete(/free) some blocks by hand? As all (re)scheduled pods fail at this point.
Also, the total amount of pods running on this cluster is 109.
Is there a way I can force delete(/free) some blocks by hand?
Try rebooting the coild running on the master node with kubectl delete pod
. Coild frees unused blocks when it starts.
Can you reproduce this issue with Kind or something? If you can, please tell me how to do that.
I think all that's needed is the default IP address pool as mentioned in the documentation, as well as this coredns yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
labels:
k8s-app: kube-dns
name: coredns
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
egress.coil.cybozu.com/webserver-internet: nat
creationTimestamp: null
labels:
k8s-app: kube-dns
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: k8s.gcr.io/coredns/coredns:v1.8.6
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8181
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
dnsPolicy: None
dnsConfig:
nameservers:
- 1.1.1.1
- 8.8.8.8
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: coredns
serviceAccountName: coredns
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
Would have to try to replicate this behavior in Kind, never used it before.
Thanks, will look into that.
We couldn't reproduce this issue.
Could you provide me with more details on how you encountered this?
I will try to reproduce this issue using Kind myself this weekend. It all came down to CoreDNS trying to schedule on a master node, which failed due to invalid CNI version. This resulted in a crash-loop that created lots of address blocks.
Feel free to reopen this issue if you still have a problem.