Pod Placement
ggrames opened this issue · 11 comments
Hi,
is this solution also working for openshift 4.9 and higher?
Because i have some problems concerning pod placement: x node(s) didn't match Pod's node affinity/selector.
It results in a Pending Job Instance
Thank you for the info
I just checked one of our clusters and the labels and tolerations match the configuration in backup-cronjob.yaml.
Could you provide the output of oc get nodes --show-labels
so that we can compare the output.
Sorry for the delay
ocp-compute-01.my.domain.at Ready worker 2y298d v1.22.8+f34b40c allow-kafka-broker=true,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-01.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-compute-02.my.domain.at Ready worker 2y298d v1.22.8+f34b40c allow-kafka-broker=true,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-02.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-compute-03.my.domain.at Ready worker 2y298d v1.22.8+f34b40c allow-kafka-broker=true,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-03.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-compute-04.my.domain.at Ready worker 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-04.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-compute-05.my.domain.at Ready worker 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-05.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-compute-06.my.domain.at Ready worker 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-compute-06.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
ocp-control-01.my.domain.at Ready master 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-control-01.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
ocp-control-02.my.domain.at Ready master 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-control-02.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
ocp-control-03.my.domain.at Ready master 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-control-03.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
ocp-infra-01.my.domain.at Ready infra 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-infra-01.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node.openshift.io/os_id=rhcos
ocp-infra-02.my.domain.at Ready infra 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-infra-02.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node.openshift.io/os_id=rhcos
ocp-infra-03.my.domain.at Ready infra 2y298d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-infra-03.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node.openshift.io/os_id=rhcos
ocp-infra-04.my.domain.at Ready infra 8d v1.22.8+f34b40c beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,etcdbackup=allowed,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-infra-04.my.domain.at,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node.openshift.io/os_id=rhcos
Looking at the output the node-role.kubernetes.io/master=
label is present on all of the ocp-control-*
nodes so that should be fine. What is the exact error message you see in the events?
24s Warning FailedScheduling pod/etcd-backup-manual-2023-03-02-10-10-47--1-6krt9 0/13 nodes are available: 13 node(s) didn't match Pod's node affinity/selector.
Maybe there are more restictions on the 4.9 version of the cluster than on the 4.7
Can you paste the full yaml of the pod pod/etcd-backup-manual-2023-03-02-10-10-47--1-6krt9
. Should be possible with oc get pod/etcd-backup-manual-2023-03-02-10-10-47--1-6krt9 -o yaml
.
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: privileged
creationTimestamp: "2023-03-02T09:10:52Z"
generateName: etcd-backup-manual-2023-03-02-10-10-47--1-
labels:
controller-uid: 5fa59092-3471-4594-9800-2367542578ab
job-name: etcd-backup-manual-2023-03-02-10-10-47
name: etcd-backup-manual-2023-03-02-10-10-47--1-6krt9
namespace: infra-services
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: etcd-backup-manual-2023-03-02-10-10-47
uid: 5fa59092-3471-4594-9800-2367542578ab
resourceVersion: "1381746391"
uid: b0aa7e9d-da92-4dd0-9169-841cfad575d3
spec:
containers:
- command:
- /bin/sh
- /usr/local/bin/backup.sh
envFrom:
- configMapRef:
name: backup-config
image: ghcr.io/adfinis/openshift-etcd-backup
imagePullPolicy: Always
name: backup-etcd
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 128Mi
securityContext:
privileged: true
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
name: host
- mountPath: /backup
name: volume-backup
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xsb2x
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
hostPID: true
imagePullSecrets:
- name: etcd-backup-dockercfg-62t5j
nodeSelector:
node-role.kubernetes.io/master: ""
node-role.kubernetes.io/worker: ""
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: etcd-backup
serviceAccountName: etcd-backup
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
volumes:
- hostPath:
path: /
type: Directory
name: host
- name: volume-backup
persistentVolumeClaim:
claimName: etcd-backup-pvc
- name: kube-api-access-xsb2x
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-02T09:10:52Z"
message: '0/13 nodes are available: 13 node(s) didn''t match Pod''s node affinity/selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
Looking at the pod YAML I see that you have two node selectors that contradict each other:
nodeSelector:
node-role.kubernetes.io/master: ""
node-role.kubernetes.io/worker: ""
I would assume that you're starting the cron job in a namespace which has the an openshift.io/node-selector
annotation which adds the node selector for the worker:
apiVersion: v1
kind: Namespace
metadata:
name: example
annotations:
openshift.io/node-selector: node-role.kubernetes.io/worker=""
Ok, thank you i will give it a try
I will be able to test this on monday
Any feedback on this? Was I able to guide you to a fix for your problem?
Hi,
At the moment it is still not working.
But maybe it is a general prob in my cluster, because also gitops Pods have problems with pod placements.
I habe a question open at Redhat. I will inform you
Thank you at the moment