Connections to NATS server not closed
AMarti96 opened this issue · 5 comments
Hello team! We have been using NACK for a while towards a NATS server located in Kubernetes, but today we started the migration towards Synadia Cloud as it will avoid us the maintenance of the NATS Cluster.
But, when trying to integrate our current NACK CRDs (creating tens of subjects and consumers for 10 different accounts), we started to receive errors from our instance. After a bit of debugging we realized the problem seems to be on how NACK is handling the connections towards the server.
Any suggestion or workaround other than killing the NACK instance each time to restart the connection count is appreciated!
What version were you using?
Using Synadia Cloud instance to allocate the NATS Server, where connections are limited to a certain amount per account.
What environment was the server running in?
Running NACK v0.13.0
Using the following image: natsio/jetstream-controller:0.13.0
Is this defect reproducible?
Yes, it is
Create a new account in Synadia Cloud (free tier is enough). Then, start up a NACK connected to that account and try to create one stream and one consumer.
For the NACK creation
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: am-nack
helmCharts:
- name: nack
valuesInline:
jetstream:
enabled: true
image:
repository: natsio/jetstream-controller
tag: 0.13.0
namespaced: true
namespaceOverride: am-nack
releaseName: nack
version: 0.24.0
repo: https://nats-io.github.io/k8s/helm/charts/
Then create the following resources for NACK to process
apiVersion: jetstream.nats.io/v1beta2
kind: Account
metadata:
name: poc-a
spec:
name: poc-a
servers:
- tls://connect.ngs.global
creds:
secret:
name: nats-poc-a-creds
file: poc_a.creds
---
apiVersion: jetstream.nats.io/v1beta2
kind: Stream
metadata:
name: my-stream
spec:
name: my-stream
account: 'poc-a'
subjects:
- "my-subject"
retention: "limits"
maxConsumers: -1
maxMsgsPerSubject: -1
maxMsgs: 0
maxBytes: 512
maxAge: "0"
maxMsgSize: -1
storage: file
discard: old
replicas: 1
duplicateWindow: "120000000000ns"
denyDelete: false
allowRollup: false
allowDirect: false
---
apiVersion: jetstream.nats.io/v1beta2
kind: Consumer
metadata:
name: my-consumer
spec:
streamName: my-stream
account: 'poc-a'
ackPolicy: explicit
ackWait: "30000000000ns"
deliverPolicy: all
deliverSubject: my-subject
deliverGroup: my-subject
durableName: my-subject
filterSubject: my-subject
maxAckPending: 1000
maxDeliver: -1
replayPolicy: instant
replicas: 0
Once applied, in the logs from NACK I can see them correctly created
I1109 14:41:19.863759 1 main.go:122] Starting /jetstream-controller v0.13.0...
I1109 14:41:42.450251 1 event.go:298] Event(v1.ObjectReference{Kind:"Stream", Namespace:"am-nack", Name:"my-stream", UID:"a7d72228-2804-4a72-9c6d-a727407f71a4", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174200", FieldPath:""}): type: 'Normal' reason: 'Connecting' Connecting to new nats-servers
I1109 14:41:42.522663 1 event.go:298] Event(v1.ObjectReference{Kind:"Consumer", Namespace:"am-nack", Name:"my-consumer", UID:"35d8458b-8d58-4c34-a865-d78ffe495cc2", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174197", FieldPath:""}): type: 'Normal' reason: 'Connecting' Connecting to new nats-servers
I1109 14:41:42.557387 1 event.go:298] Event(v1.ObjectReference{Kind:"Stream", Namespace:"am-nack", Name:"my-stream", UID:"a7d72228-2804-4a72-9c6d-a727407f71a4", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174200", FieldPath:""}): type: 'Normal' reason: 'Creating' Creating stream "my-stream"
I1109 14:41:42.632145 1 event.go:298] Event(v1.ObjectReference{Kind:"Consumer", Namespace:"am-nack", Name:"my-consumer", UID:"35d8458b-8d58-4c34-a865-d78ffe495cc2", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174197", FieldPath:""}): type: 'Normal' reason: 'Creating' Creating consumer "my-consumer" on stream "my-stream"
I1109 14:41:42.637505 1 event.go:298] Event(v1.ObjectReference{Kind:"Stream", Namespace:"am-nack", Name:"my-stream", UID:"a7d72228-2804-4a72-9c6d-a727407f71a4", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174200", FieldPath:""}): type: 'Normal' reason: 'Connecting' Connecting to new nats-servers
I1109 14:41:42.739325 1 event.go:298] Event(v1.ObjectReference{Kind:"Consumer", Namespace:"am-nack", Name:"my-consumer", UID:"35d8458b-8d58-4c34-a865-d78ffe495cc2", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174197", FieldPath:""}): type: 'Normal' reason: 'Connecting' Connecting to new nats-servers
I1109 14:41:42.835188 1 event.go:298] Event(v1.ObjectReference{Kind:"Stream", Namespace:"am-nack", Name:"my-stream", UID:"a7d72228-2804-4a72-9c6d-a727407f71a4", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174200", FieldPath:""}): type: 'Normal' reason: 'Created' Created stream "my-stream"
I1109 14:41:42.936729 1 event.go:298] Event(v1.ObjectReference{Kind:"Consumer", Namespace:"am-nack", Name:"my-consumer", UID:"35d8458b-8d58-4c34-a865-d78ffe495cc2", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"74174197", FieldPath:""}): type: 'Normal' reason: 'Created' Created consumer "my-consumer" on stream "my-stream"
The same goes for Synadia UI, I can see them.
But the Connections
count is kept at 2
and never goes down (waited for more than 1h and nothing).
Similarly, if the Stream/Consumer has any kind of typo in the spec, NACK opens an infinite amount of connections during the retries trying to reconcile, which makes the Synadia Account stop processing the connections.
Given the capability you are leveraging, describe your expectation?
I would expect NACK to only use 1 connection to NATS given a set of resources all pointing to the same account, and not create a new connection for each time the reconcile loop is processed.
Given the expectation, what is the defect you are observing?
More connections than necessary are created in NACK and old connections are never closed.
To provide more detail, I was able to extract the number of connections in our current NATS server(running inside a Kuberentes cluster and populating Streams/Consumers via NACK). Using nats-top I was able to get the following
as you can see, all of them (the 95 connections in this specific screenshot) are from jetstream-controller
, which is NACK creating the Streams/Consumers and never disconnecting
The problem seems to be only when the connection to NATS is defined in the account
CRD. In the code, that means when crdConnect
is set to true
.
When setting one NATS connection in the overall server settings (crdConnect
set to false) it doesn't matter how many objects I create or how many times the connection is retried, only 1 connection is reported:
With that in mind, I think the error may come in this part of the code:
nack/controllers/jetstream/stream.go
Lines 183 to 199 in b6bb02b
Thanks for reporting! We should be able to put a connection pooler into NACK to prevent this. There is already an implementation in the nats-surveyor
repo.
We'll port it over should be able to get that done next week
Connection pooling reference from nats-surveyor: https://github.com/nats-io/nats-surveyor/blob/main/surveyor/conn_pool.go
Connection pool added in v0.14.0