Controller manager seems to be not working properly due to cachcing errors
changhyuni opened this issue · 1 comments
changhyuni commented
/kind bug
What steps did you take and what happened:
I0823 03:05:37.226125 1 request.go:1212] Response Body: {"kind":"APIResourceList","apiVersion":"v1","groupVersion":"controlplane.cluster.x-k8s.io/v1beta2","resources":[{"name":"awsmanagedcontrolplanes","singularName":"awsmanagedcontrolplane","namespaced":true,"kind":"AWSManagedControlPlane","verbs":["delete","deletecollection","get","list","patch","create","update","watch"],"shortNames":["awsmcp"],"categories":["cluster-api"],"storageVersionHash":"WnEFh7oqH48="},{"name":"awsmanagedcontrolplanes/status","singularName":"","namespaced":true,"kind":"AWSManagedControlPlane","verbs":["get","patch","update"]},{"name":"rosacontrolplanes","singularName":"rosacontrolplane","namespaced":true,"kind":"ROSAControlPlane","verbs":["delete","deletecollection","get","list","patch","create","update","watch"],"shortNames":["rosacp"],"categories":["cluster-api"],"storageVersionHash":"qdhYg8dFBqo="},{"name":"rosacontrolplanes/status","singularName":"","namespaced":true,"kind":"ROSAControlPlane","verbs":["get","patch","update"]}]}
I0823 03:05:37.226379 1 shared_informer.go:337] stop requested
E0823 03:05:37.226393 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226424 1 reflector.go:289] Starting reflector *v1beta2.AWSManagedControlPlane (9m13.30993253s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226445 1 shared_informer.go:337] stop requested
I0823 03:05:37.226459 1 shared_informer.go:337] stop requested
E0823 03:05:37.226464 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226448 1 reflector.go:289] Starting reflector *v1beta2.AWSManagedCluster (10m42.211609855s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
E0823 03:05:37.226477 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSCluster Informer to sync"
I0823 03:05:37.226481 1 reflector.go:295] Stopping reflector *v1beta2.AWSManagedCluster (10m42.211609855s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226486 1 shared_informer.go:337] stop requested
I0823 03:05:37.226433 1 shared_informer.go:337] stop requested
E0823 03:05:37.226497 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSCluster Informer to sync"
I0823 03:05:37.226437 1 shared_informer.go:337] stop requested
E0823 03:05:37.226511 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta1.Machine Informer to sync"
I0823 03:05:37.226449 1 reflector.go:295] Stopping reflector *v1beta2.AWSManagedControlPlane (9m13.30993253s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226538 1 internal.go:530] "Stopping and waiting for webhooks"
E0823 03:05:37.226499 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedCluster Informer to sync"
I0823 03:05:37.226429 1 shared_informer.go:337] stop requested
E0823 03:05:37.226562 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedCluster Informer to sync"
I0823 03:05:37.226441 1 shared_informer.go:337] stop requested
E0823 03:05:37.226576 1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226596 1 server.go:249] "controller-runtime/webhook: Shutting down webhook server with timeout of 1 minute"
I0823 03:05:37.226660 1 internal.go:533] "Stopping and waiting for HTTP servers"
I0823 03:05:37.226688 1 server.go:43] "shutting down server" kind="health probe" addr="[::]:9440"
I0823 03:05:37.226717 1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
E0823 03:05:37.226747 1 logger.go:99] "setup: problem running manager" err="failed to start metrics server: failed to create listener: listen tcp: address 8443: missing port in address"
Environment:
Here my manifest (controller manager)
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
argocd.argoproj.io/instance: cluster-api
cluster.x-k8s.io/provider: infrastructure-aws
control-plane: capa-controller-manager
name: capa-controller-manager
namespace: capa-system
spec:
replicas: 1
selector:
matchLabels:
cluster.x-k8s.io/provider: infrastructure-aws
control-plane: capa-controller-manager
template:
metadata:
labels:
cluster.x-k8s.io/provider: infrastructure-aws
control-plane: capa-controller-manager
spec:
containers:
- args:
- '--leader-elect'
- '--feature-gates=EKS=true'
- '--v=10'
- '--diagnostics-address=8443'
- '--insecure-diagnostics=false'
env:
- name: AWS_SHARED_CREDENTIALS_FILE
value: /home/.aws/credentials
image: >-
kcr.dev.kabang.cloud/container-registry/external/cluster-api/cluster-api-aws-controller:v2.4.1
imagePullPolicy: IfNotPresent
imagePullSecrets: kcr-token
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: healthz
periodSeconds: 10
name: manager
ports:
- containerPort: 9443
name: webhook-server
protocol: TCP
- containerPort: 9440
name: healthz
protocol: TCP
- containerPort: 8443
name: metrics
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: healthz
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsGroup: 65532
runAsUser: 65532
volumeMounts:
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
securityContext:
fsGroup: 1000
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
serviceAccountName: capa-controller-manager
terminationGracePeriodSeconds: 10
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
volumes:
- name: cert
secret:
defaultMode: 420
secretName: capa-webhook-service-cert
- Cluster-api-provider-aws version: v2.4.1
- Kubernetes version: (use
kubectl version
): 1.27 (eks) - OS (e.g. from
/etc/os-release
):
k8s-ci-robot commented
This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.