NATS Container restart frequently in AKS Cluster with the following error logs
saitessell opened this issue · 3 comments
What version were you using?
Helm chart 1.0.2
My NATS Helm chart values file is
config:
nats:
tls:
enabled: true
secretName: ${NATS_CERT_SECRET_NAME}
cert: "tls.crt"
key: "tls.key"
resources:
limits:
cpu: 256m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
jetstream:
enabled: true
memoryStore:
enabled: true
# ensure that container has a sufficient memory limit greater than maxSize
maxSize: 5Gi
fileStore:
pvc:
enabled: true
size: 5Gi
storageClassName: aks-storage-class # NOTE: Azure setup but customize as needed for your infra.
cluster:
enabled: true
replicas: 3
name: nats-cluster
noAdvertise: true
resolver:
enabled: true
merge:
type: full
interval: 2m
timeout: 1.9s
allow_delete: true
With this configuration i am unable to get the NATS running and i am seeing the following logs in any of the nats pod
[1] 2024/02/12 05:56:42.769637 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-service-0.nats-service-headless": lookup nats-service-0.nats-service-headless on 172.29.255.254:53: no such host
[1] 2024/02/12 05:56:43.325412 [INF] JetStream cluster new metadata leader: nats-service-1/nats-service
[1] 2024/02/12 05:56:54.992207 [INF] 172.27.6.37:41188 - rid:8 - Route connection created
[1] 2024/02/12 05:56:54.992603 [INF] 172.27.6.37:41188 - rid:8 - Router connection closed: Duplicate Route
[1] 2024/02/12 05:57:00.934672 [INF] 172.27.5.82:41014 - rid:9 - Route connection created
[1] 2024/02/12 05:57:00.935144 [INF] 172.27.5.82:41014 - rid:9 - Router connection closed: Duplicate Route
[1] 2024/02/12 05:58:07.146084 [INF] JetStream cluster no metadata leader
[1] 2024/02/12 05:58:29.664137 [INF] JetStream cluster no metadata leader
[1] 2024/02/12 05:58:42.596377 [WRN] JetStream has not established contact with a meta leader
[1] 2024/02/12 05:58:50.704588 [INF] JetStream cluster no metadata leader
[1] 2024/02/12 05:59:13.759596 [INF] JetStream cluster no metadata leader
[1] 2024/02/12 05:59:39.226129 [INF] JetStream cluster no metadata leader
[1] 2024/02/12 06:00:00.277063 [INF] JetStream cluster no metadata leader
### What environment was the server running in?
NATS is deployed in AKS cluster with kubeDNS
### Is this defect reproducible?
Deploying the Helm Chart with Jetstream enabled and in cluster mode is causing the nats containers in nats-service pods to not pass healthcheck probes
### Given the capability you are leveraging, describe your expectation?
I want to enable NATS in cluster mode with jetstream enabled
### Given the expectation, what is the defect you are observing?
Because of the failure of the container i am not able to bring up the NATS
This looks like the configuration for the 0.x helm chart. Can you upgrade to the latest 1.x helm chart
https://github.com/nats-io/k8s/blob/main/helm/charts/nats/UPGRADING.md
This is actually the config for 1.x version of helm chart. I took the reference from here
https://github.com/nats-io/k8s/blob/nats-1.0.2/helm/charts/nats/values.yaml
Ah ok, I must have misread it then. For the Resources those go under container.merge
and not config.nats
. Also if you are going to give it 5Gi in config.jetstream.memoryStore.maxSize
you will want to make sure to request more than that amount of memory:
https://github.com/nats-io/k8s/blob/nats-1.0.2/helm/charts/nats/README.md#nats-container-resources
container:
env:
# different from k8s units, suffix must be B, KiB, MiB, GiB, or TiB
# should be ~90% of memory limit
GOMEMLIMIT: 7GiB
merge:
# recommended limit is at least 2 CPU cores and 8Gi Memory for production JetStream clusters
resources:
requests:
cpu: "2"
memory: 8Gi
limits:
cpu: "2"
memory: 8Gi
From the looks of it, your containers are not able to establish network connectivity to one another. For example it looks like you named your deployment nats-service
. So from nats-service-0
pod you should be able to resolve and connect to nats-service-1.nats-service-headless
and nats-service-2.nats-service-headless