nginxinc/nginx-service-mesh

Spire Agent Pods Fail to Deploy

therealnoof opened this issue ยท 12 comments

When trying to deploy the service mesh, my deployment hangs at initializing the spire agent pods.
Finally gives up and NGINX rolls back

K8s version v1.20.1
running k8s on Ubuntu 18.04.5 LTS EC2 AWS instances
1 control
2 workers

Nodes are in a healthy state
Installed nginx-meshctl via the download via git

ubuntu@k8s-control:~$ kubectl get pods -n nginx-mesh
NAME READY STATUS RESTARTS AGE
spire-agent-kxdnr 0/1 Init:CrashLoopBackOff 6 12m
spire-agent-pgxz8 0/1 Init:CrashLoopBackOff 6 12m
spire-server-6cfc7df7d7-ftkjx 2/2 Running 0 12m

ubuntu@k8s-control:~$ kubectl logs spire-agent-7w9c9 -n nginx-mesh
Error from server (BadRequest): container "spire-agent" in pod "spire-agent-7w9c9" is waiting to start: PodInitializing

ubuntu@k8s-control:~$ kubectl describe pod spire-agent-7w9c9 -n nginx-mesh
Name: spire-agent-7w9c9
Namespace: nginx-mesh
Priority: 0
Node: k8s-worker2/10.0.4.91
Start Time: Wed, 02 Jun 2021 23:47:48 +0000
Labels: app.kubernetes.io/name=spire-agent
app.kubernetes.io/part-of=nginx-service-mesh
controller-revision-hash=5c6d478476
pod-template-generation=1
Annotations:
Status: Pending
IP: 10.0.4.91
IPs:
IP: 10.0.4.91
Controlled By: DaemonSet/spire-agent
Init Containers:
init:
Container ID: containerd://4e8c126bab2bb00647679094b6e9083aed177509a20039b972a2e3d6b7e40ba9
Image: gcr.io/spiffe-io/wait-for-it
Image ID: gcr.io/spiffe-io/wait-for-it@sha256:d9bdc931e4404237d2fb0ba84db5ece88b236c40eeca570d786ee54fd243f4ae
Port:
Host Port:
Args:
-t
30
spire-server:8081
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Wed, 02 Jun 2021 23:56:28 +0000
Finished: Wed, 02 Jun 2021 23:56:58 +0000
Ready: False
Restart Count: 6
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from spire-agent-token-94prq (ro)
Containers:
spire-agent:
Container ID:
Image: gcr.io/spiffe-io/spire-agent:0.12.1
Image ID:
Port:
Host Port:
Args:
-config
/run/spire/config/agent.conf
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: exec [/opt/spire/bin/spire-agent healthcheck -shallow -socketPath /run/spire/sockets/agent.sock] delay=15s timeout=3s period=60s #success=1 #failure=2
Readiness: exec [/opt/spire/bin/spire-agent healthcheck -socketPath /run/spire/sockets/agent.sock] delay=5s timeout=1s period=5s #success=1 #failure=3
Environment:
MY_NODE_NAME: (v1:spec.nodeName)
Mounts:
/run/spire/bundle from spire-bundle (rw)
/run/spire/config from spire-config (ro)
/run/spire/sockets from spire-agent-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spire-agent-token-94prq (ro)
/var/run/secrets/tokens from spire-token (rw)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
spire-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spire-agent
Optional: false
spire-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spire-bundle
Optional: false
spire-agent-socket:
Type: HostPath (bare host directory volume)
Path: /run/spire/sockets
HostPathType: DirectoryOrCreate
spire-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 7200
spire-agent-token-94prq:
Type: Secret (a volume populated by a Secret)
SecretName: spire-agent-token-94prq
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message


Normal Scheduled 9m50s default-scheduler Successfully assigned nginx-mesh/spire-agent-7w9c9 to k8s-worker2
Normal Pulled 9m50s kubelet Successfully pulled image "gcr.io/spiffe-io/wait-for-it" in 232.655347ms
Normal Pulled 9m19s kubelet Successfully pulled image "gcr.io/spiffe-io/wait-for-it" in 235.545751ms
Normal Pulled 8m36s kubelet Successfully pulled image "gcr.io/spiffe-io/wait-for-it" in 228.685843ms
Normal Started 7m39s (x4 over 9m50s) kubelet Started container init
Normal Pulled 7m39s kubelet Successfully pulled image "gcr.io/spiffe-io/wait-for-it" in 245.295299ms
Normal Created 6m28s (x5 over 9m50s) kubelet Created container init
Normal Pulling 6m28s (x5 over 9m51s) kubelet Pulling image "gcr.io/spiffe-io/wait-for-it"
Normal Pulled 6m28s kubelet Successfully pulled image "gcr.io/spiffe-io/wait-for-it" in 234.332423ms
Warning BackOff 4m45s (x13 over 8m48s) kubelet Back-off restarting failed container

Did you enable Service Account Token Volume Projection? https://docs.nginx.com/nginx-service-mesh/get-started/platform/kubeadm/

I already had these configured in my /etc/kubernetes/manifests/kube-apiserver.yaml
Am I suppose to have these configured elsewhere?

- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-account-api-audiences=api
- --service-account-issuer=api

Thanks for checking. Can you collect the spire server logs? kubectl logs spire-server-6cfc7df7d7-ftkjx -n nginx-mesh -c spire-server

Here are the logs from both containers in the spire server pod.

ubuntu@k8s-control:/run$ kubectl logs spire-server-6cfc7df7d7-f49fq -c spire-server -n nginx-mesh
time="2021-06-03T03:32:23Z" level=warning msg="Current umask 0022 is too permissive; setting umask 0027"
time="2021-06-03T03:32:23Z" level=info msg="Data directory: "/run/spire/data""
time="2021-06-03T03:32:23Z" level=info msg="Opening SQL database" db_type=sqlite3 subsystem_name=built-in_plugin.sql
time="2021-06-03T03:32:23Z" level=info msg="Initializing new database" subsystem_name=built-in_plugin.sql
time="2021-06-03T03:32:23Z" level=info msg="Connected to SQL database" read_only=false subsystem_name=built-in_plugin.sql type=sqlite3 version=3.25.2
time="2021-06-03T03:32:23Z" level=info msg="Plugin loaded" built-in_plugin=true plugin_name=k8s_psat plugin_services="[]" plugin_type=NodeAttestor subsystem_name=catalog
time="2021-06-03T03:32:23Z" level=info msg="Plugin loaded" built-in_plugin=true plugin_name=noop plugin_services="[]" plugin_type=NodeResolver subsystem_name=catalog
time="2021-06-03T03:32:23Z" level=info msg="Plugin loaded" built-in_plugin=true plugin_name=k8sbundle plugin_services="[]" plugin_type=Notifier subsystem_name=catalog
time="2021-06-03T03:32:23Z" level=info msg="Plugin loaded" built-in_plugin=true plugin_name=disk plugin_services="[]" plugin_type=KeyManager subsystem_name=catalog
time="2021-06-03T03:32:23Z" level=info msg="Plugins started"
time="2021-06-03T03:32:23Z" level=debug msg="Loading journal" path=/run/spire/data/journal.pem subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=info msg="Journal loaded" jwt_keys=0 subsystem_name=ca_manager x509_cas=0
time="2021-06-03T03:32:23Z" level=debug msg="Preparing X509 CA" slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=info msg="X509 CA prepared" expiration="2021-12-30T03:32:23Z" issued_at="2021-06-03T03:32:23Z" self_signed=true slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=info msg="X509 CA activated" expiration="2021-12-30T03:32:23Z" issued_at="2021-06-03T03:32:23Z" slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=debug msg="Successfully rotated X.509 CA" subsystem_name=ca_manager trust_domain_id="spiffe://example.org" ttl=1.814399963721197e+07
time="2021-06-03T03:32:23Z" level=debug msg="Preparing JWT key" slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=info msg="JWT key prepared" expiration="2021-12-30T03:32:23Z" issued_at="2021-06-03T03:32:23Z" slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=info msg="JWT key activated" expiration="2021-12-30T03:32:23Z" issued_at="2021-06-03T03:32:23Z" slot=A subsystem_name=ca_manager
time="2021-06-03T03:32:23Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T03:32:23Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T04:32:23Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T03:32:23Z" level=info msg="Building in-memory entry cache" subsystem_name=endpoints
time="2021-06-03T03:32:23Z" level=info msg="Completed building in-memory entry cache" subsystem_name=endpoints
time="2021-06-03T03:32:23Z" level=debug msg="Initializing API endpoints" subsystem_name=endpoints
time="2021-06-03T03:32:23Z" level=debug msg="Starting checker" name=server subsystem_name=health
time="2021-06-03T03:32:23Z" level=info msg="Starting UDS server" address=/run/spire/sockets/spire-registration.sock subsystem_name=endpoints
time="2021-06-03T03:32:23Z" level=info msg="Starting TCP server" address="[::]:8081" subsystem_name=endpoints
time="2021-06-03T03:32:23Z" level=debug msg="Notifier handled event" event="bundle loaded" notifier=k8sbundle subsystem_name=ca_manager
time="2021-06-03T04:02:18Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T04:02:18Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T05:02:18Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T04:32:13Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T04:32:13Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T05:32:13Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T05:02:08Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T05:02:08Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T06:02:08Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T05:32:03Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T05:32:03Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T06:32:03Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T06:01:58Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T06:01:58Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T07:01:58Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T06:31:53Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T06:31:53Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T07:31:53Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T07:01:48Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T07:01:48Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T08:01:48Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T07:31:43Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T07:31:43Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T08:31:43Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T08:01:38Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T08:01:38Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T09:01:38Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T08:31:33Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T08:31:33Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T09:31:33Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T09:01:28Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T09:01:28Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T10:01:28Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T09:31:23Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T09:31:23Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T10:31:23Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T10:01:18Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T10:01:18Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T11:01:18Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T10:31:13Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T10:31:13Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T11:31:13Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T11:01:08Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T11:01:08Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T12:01:08Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T11:31:03Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T11:31:03Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T12:31:03Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T12:00:58Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T12:00:58Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T13:00:58Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T12:30:53Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T12:30:53Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T13:30:53Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca
time="2021-06-03T13:00:48Z" level=debug msg="Rotating server SVID" subsystem_name=svid_rotator
time="2021-06-03T13:00:48Z" level=debug msg="Signed X509 SVID" expiration="2021-06-03T14:00:48Z" spiffe_id="spiffe://example.org/spire/server" subsystem_name=ca

ubuntu@k8s-control:/run$ kubectl logs spire-server-6cfc7df7d7-f49fq -c k8s-workload-registrar -n nginx-mesh
time="2021-06-03T03:32:23Z" level=info msg="Connecting to local registration server socket unix:///run/spire/sockets/spire-registration.sock"
time="2021-06-03T03:32:23Z" level=info msg="Initializing SPIFFE ID CRD Mode"
time="2021-06-03T03:32:26Z" level=info msg="Created entry" entryID=c690b139-8f7e-4c04-b02e-52fd39510e5a spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-control"
time="2021-06-03T03:32:27Z" level=info msg="Created entry" entryID=400faf52-347f-4a5e-b1e4-e4b893a01e8e spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"
time="2021-06-03T03:32:28Z" level=info msg="Created entry" entryID=2ea21676-5c44-4e7c-8b59-9169e67c5273 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker1"
time="2021-06-03T03:32:29Z" level=info msg="Created entry" entryID=1124ab61-7876-456c-9388-497ec31bb7d9 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"
time="2021-06-03T03:32:29Z" level=info msg="Updated entry" entryID=c690b139-8f7e-4c04-b02e-52fd39510e5a spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-control"
time="2021-06-03T03:32:30Z" level=info msg="Created entry" entryID=7a9cfb9c-b977-4899-91e9-b60b6e704ad8 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker2"
time="2021-06-03T03:33:00Z" level=info msg="Created entry" entryID=f07d6bc0-f76e-4aa0-b135-ca6ddf2c10a0 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-server"
time="2021-06-03T03:33:00Z" level=info msg="Updated entry" entryID=400faf52-347f-4a5e-b1e4-e4b893a01e8e spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"
time="2021-06-03T03:33:30Z" level=info msg="Created entry" entryID=8e02faeb-cd95-4b34-8a53-57fde3633d75 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:33:30Z" level=info msg="Updated entry" entryID=2ea21676-5c44-4e7c-8b59-9169e67c5273 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker1"
time="2021-06-03T03:33:31Z" level=info msg="Adding DNS name" dnsName=spire-server.nginx-mesh.svc spiffeID=spire-server-6cfc7df7d7-f49fq
time="2021-06-03T03:34:00Z" level=info msg="Created entry" entryID=376d9190-cc9a-4275-82dd-675f0180b313 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:34:00Z" level=info msg="Updated entry" entryID=1124ab61-7876-456c-9388-497ec31bb7d9 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"
time="2021-06-03T03:34:01Z" level=info msg="Adding DNS name" dnsName=k8s-workload-registrar.nginx-mesh.svc spiffeID=spire-server-6cfc7df7d7-f49fq
time="2021-06-03T03:34:30Z" level=info msg="Created entry" entryID=3488997b-daed-4708-9575-009cb9b739e4 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:34:30Z" level=info msg="Updated entry" entryID=7a9cfb9c-b977-4899-91e9-b60b6e704ad8 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker2"
time="2021-06-03T03:34:30Z" level=info msg="Updated entry" entryID=f07d6bc0-f76e-4aa0-b135-ca6ddf2c10a0 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-server"
time="2021-06-03T03:34:30Z" level=info msg="Updated entry" entryID=8e02faeb-cd95-4b34-8a53-57fde3633d75 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:34:30Z" level=info msg="Updated entry" entryID=376d9190-cc9a-4275-82dd-675f0180b313 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:34:30Z" level=info msg="Updated entry" entryID=3488997b-daed-4708-9575-009cb9b739e4 spiffeID="spiffe://example.org/ns/default/sa/default"
E0603 03:42:19.627337 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=414&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:19.627429 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=364&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:19.627478 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=440&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:19.627528 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=395&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:20.628258 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=384&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:20.629442 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=387&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:20.630560 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=503&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:20.631750 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=365&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:21.629203 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=360&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:21.630120 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=408&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:21.631322 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=471&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:21.632513 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=558&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:22.630157 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=387&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:22.630988 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=389&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:22.632040 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=525&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:22.633163 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=361&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:23.631157 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=559&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:23.632031 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=509&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:23.633055 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=457&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:23.634227 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=308&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:24.632159 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=347&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:24.633077 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=482&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:24.634158 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=592&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:24.635327 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=323&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:25.633044 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=478&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:25.633894 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=317&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:25.635138 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=507&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:25.636137 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=390&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:26.634155 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=324802&timeoutSeconds=351&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:26.634837 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=324054&timeoutSeconds=462&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:26.635863 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?allowWatchBookmarks=true&resourceVersion=324950&timeoutSeconds=463&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
E0603 03:42:26.636945 25 reflector.go:382] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to watch *v1beta1.SpiffeID: Get "https://10.96.0.1:443/apis/spiffeid.spiffe.io/v1beta1/spiffeids?allowWatchBookmarks=true&resourceVersion=324268&timeoutSeconds=383&watch=true": dial tcp 10.96.0.1:443: connect: connection refused
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=8e02faeb-cd95-4b34-8a53-57fde3633d75 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=c690b139-8f7e-4c04-b02e-52fd39510e5a spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-control"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=2ea21676-5c44-4e7c-8b59-9169e67c5273 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker1"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=7a9cfb9c-b977-4899-91e9-b60b6e704ad8 spiffeID="spiffe://example.org/k8s-workload-registrar/nginx-mesh/node/k8s-worker2"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=1124ab61-7876-456c-9388-497ec31bb7d9 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=f07d6bc0-f76e-4aa0-b135-ca6ddf2c10a0 spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-server"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=3488997b-daed-4708-9575-009cb9b739e4 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=376d9190-cc9a-4275-82dd-675f0180b313 spiffeID="spiffe://example.org/ns/default/sa/default"
time="2021-06-03T03:42:32Z" level=info msg="Updated entry" entryID=400faf52-347f-4a5e-b1e4-e4b893a01e8e spiffeID="spiffe://example.org/ns/nginx-mesh/sa/spire-agent"

All the "connection refused" errors are suspicious. Did your k8s API server restart?

I agree, ive been investigating the network errors but no luck yet.

The API server restarted when I added the additional commands for token volume projection.

ubuntu@k8s-control:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f4f5bf95d-t5ccp 1/1 Running 7 3d3h
calico-node-sfvlf 0/1 Running 1 3d3h
calico-node-trkz4 0/1 Running 1 3d2h
calico-node-vwwgb 0/1 Running 1 3d2h
coredns-74ff55c5b-sgvhx 1/1 Running 1 3d3h
coredns-74ff55c5b-zsr5h 1/1 Running 1 3d3h
etcd-k8s-control 1/1 Running 1 3d3h
kube-apiserver-k8s-control 1/1 Running 0 12h
kube-controller-manager-k8s-control 1/1 Running 6 3d3h
kube-proxy-hszlt 1/1 Running 1 3d3h
kube-proxy-qcbbr 1/1 Running 1 3d2h
kube-proxy-vk4hs 1/1 Running 1 3d2h
kube-scheduler-k8s-control 1/1 Running 6 3d3h

Is it possible there is a security groups or acl setting that is blocking communication?

This is a fairly new cluster, i checked for any network policies I may have deployed and there are none.
ubuntu@k8s-control:~$ kubectl get networkpolicy --all-namespaces
No resources found

I will continue to investigate the network error. If I cant find the cause, then ill probably tear the cluster down and rebuild, its possible something got misconfigured.

ill close this out if I end up rebuilding

Ok, found the networking issue. I was missing the 443 port in my AWS security group. In the Kubeadm guide it mentions 6443 but no mention of 443 , only that this port can be overridden. After changing my SG to allow all, NGINX service mesh installed. I should have noticed the 443 earlier in the logs but got tunnel vision looking at the k8s deployment.
Ill close this one out......sigh
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

Thank you for the prompt replies, appreciate the team.

closing this issue out as it was related to a networking issue in AWS

Thanks for following up @therealnoof