Nats-Operator incompatible with istio?
therealmitchconnors opened this issue ยท 19 comments
When I follow the instructions in the project readme to create a nats cluster with 3 members on a gke cluster using istio, all three members immediately show unhealthy and quickly go to crashloopbackoff. Is there something additional I need to do to get nats-operator to play nice with a service mesh?
My Nats Cluster:
echo '
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
name: "example-nats-cluster"
spec:
size: 3
version: "1.3.0"
' | kubectl apply -f -
Log from one member:
[1] 2018/10/30 20:27:15.907885 [INF] Starting nats-server version 1.3.0
[1] 2018/10/30 20:27:15.907943 [INF] Git commit [eed4fbc]
[1] 2018/10/30 20:27:15.908133 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/10/30 20:27:15.908194 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/10/30 20:27:15.908208 [INF] Server is ready
[1] 2018/10/30 20:27:15.908541 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/10/30 20:27:15.914868 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:16.930604 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:17.935214 [INF] 10.12.12.4:6222 - rid:1 - Route connection created
[1] 2018/10/30 20:27:17.940613 [INF] 127.0.0.1:41486 - rid:2 - Route connection created
[1] 2018/10/30 20:27:18.962862 [INF] 10.12.12.4:6222 - rid:3 - Route connection created
(and the Route connection messages continue 290 times before the container is shut down as unhealthy)
My Istio deployment is the default Isitio App from the GCP marketplace, with three nodes in it.
K8S version info:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.6", GitCommit:"9b635efce81582e1da13b35a7aa539c0ccb32987", GitTreeState:"clean", BuildDate:"2018-08-16T21:33:47Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
istio-pilot version is 1.3
I'd be happy to add more detail if there are follow up questions. I can also cross-post this issue to Istio if the problem appears to be on their side...
One more detail: I am running with sidecars enabled, an the NATS pods get properly injected with the istio-proxy container, which is healthy.
NATS protocol requires direct connectivity to peers when trying to establish routes. With Istio, you are introducing a proxy (actually, two!) in between peers. You shouldn't do that!
Since I'm not versed in Istio, I don't have a concrete answer for you but I'm thinking that maybe this thread will help.
One way to support Istio would be to create services with manually managed endpoints. The service's name would be the pod name and the selector wouldn't be set, allowing the operator to create an endpoint manually. Deletion of the service as well as the endpoint would be handled by the automatic garbage collection by setting the owner of these to the pod. I'm currently doing that in one of our internal operators and it's quite painless. What I don't know is a) whether it's worth it (๐) and b) what else would need updating with regards to discovery and the certificate SANs (grasping at straws here as I didn't have time to read the whole source yet). If Istio's configured to do mTLS then the whole TLS handling could be disabled in the operator because it'd be handled transparently by Istio. In that case you'd gain the metrics generation via Istio while still being secure. Would anything stand in the way of taking a stab at implementing that? /cc @pires
@therealmitchconnors take a look at #111 to get it to work for now.
Making use of Istio to monitor NATS traffic would be great!
Even with simple applications based on NATS, it is hard to tell where communications are broken between services.
Take note that's I'm currently experimenting NATS on OpenShift + Istio + Kiali (https://www.kiali.io/).
That issue was also raised on the Istio project side: istio/old_issues_repo#338
Hey all, just wanted to put this here for the record. I just spun up a nats cluster using this operator. I created the istio VirtualServices
as you would normally do, and everything appears to be working as expected. 3 node cluster is live and seems to be properly clustered. I checked the cluster routes by hitting the /routez
endpoint on the management network.
Here are the virtual service definitions:
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nats
spec:
hosts:
- nats-cluster.nats.svc.cluster.local
tcp:
- match:
- port: 4222
route:
- destination:
host: nats-cluster.nats.svc.cluster.local
port:
number: 4222
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nats-management
spec:
hosts:
- nats-cluster-mgmt.nats.svc.cluster.local
tcp:
- match:
- port: 8222
route:
- destination:
host: nats-cluster-mgmt.nats.svc.cluster.local
port:
number: 8222
- match:
- port: 6222
route:
- destination:
host: nats-cluster-mgmt.nats.svc.cluster.local
port:
number: 6222
Hi! @thedodd did you have sidecar injected? I'm struggling currently with the same issue but I have sidecar enabled.
I have managed to connect to nats using telnet from pod with sidecar but from nats client it fails.
@piotrmsc , struggling with the same issue, client requests failing, tried using tcp protocol for virtualservice and serviceentry but wasn't able to succeed. Any solution that you found?
Here are my configurations:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: nats.test.com
spec:
hosts:
- nats.test.com
location: MESH_EXTERNAL
ports:
- number: 8443
name: tcp
protocol: TCP
resolution: DNS
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nats-external
spec:
hosts:
- nats.test.com
tcp:
- match:
- port: 8443
timeout: 60s
route:
- destination:
host: nats.test.com
port:
number: 8443
weight: 100
@thedodd I tried your solution, but it did not work for me. Were the istio sidecars injected to your NATS pods?
From a first glance I was noticing that I wasn't getting a response over telnet (with istio-sidecar)
Little telnet debug no response
> telnet nats.nats-system.svc.cluster.local 4222
Telnet with a PING -> instant response
>telnet nats.nats-system.svc.cluster.local 4222
PING
INFO {"server_id":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","server_name":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","version":"2.1.7","proto":1,"git_commit":"bf0930e","go":"go1.13.10","host":"0.0.0.0","port":4222,"max_payload":1048576,"client_id":1,"client_ip":"127.0.0.1"}
PONG
Interesting istio thinks it's HTTP raw_buffer:
> istioctl pc listeners <pod-name> --port 4222
ADDRESS PORT MATCH DESTINATION
10.96.249.161 4222 Trans: raw_buffer; App: HTTP Route: nats.nats-system.svc.cluster.local:4222
Istio Explicit Port Selection helped me.
Here the service for nats doesn't declare tcp or tls. If they added appProtocol explicity for k8s 1.18+ or named the port tcp-client for example for tcp that would resolve it for Istio.
After renaming the port in service and on the pod spec:
istioctl pc listeners <pod-name> --port 4222
ADDRESS PORT MATCH DESTINATION
10.96.0.174 4222 ALL Cluster: outbound|4222||nats.nats-system.svc.cluster.local
Seems to have resolved my connectivity issues, but should be noted the same would need to be done for the other tcp ports.
This issue is also mentioned here istio/istio#28623. Are there any plans to support this in the operator? I'm struggling with finding a way to customize port names in an operator-managed service.
AppProtocol as mentioned by @lukeweber doesn't seem like a viable option yet because it's not scheduled to make it to GA until 1.21 and cloud providers may or may not allow you to customize your feature gates.
for the last two days i have been facing the same issue of making nats streaming server work in istio auto sidecar injection
enabled namespace , tried disabling mtls peerauthentication, created virtual service with explicit destination rule to make it work with other pods, nothing worked,
then exec into one of the client pod and started adding console logs to the express nodejs app on all the external connections to manually debug from the live cluster , finally figured out that nats.connect()
promise, didn't resolve at all
eventually while googling stumbled upon this open issue
huge thanks to @lukeweber for a detailed comment , i tried it and it worked .
previously, before stumbling to this github open issue page , when i used Kiali dashboard ,it threw KIA0601
error on the nats streaming server
deployment , i renamed the deployments associated service port names [the convention is <protocol_prefix>-<any_name_suffix>
] to http-suffix
convention instead of mentioning tcp-suffix
[here http & tcp is used a prefix to the port names] as i wasn't aware that nats used tcp explicitly
now, all the deployments works properly , this explicit mentioning of tcp
to the port name prefix must be documented in the nats website under a topic of something like nats working with istio
thanks @narenarjun will see if can add a page on this, or feel free to make a PR to the docs too that can be found here: https://github.com/nats-io/nats.docs/tree/master/nats-on-kubernetes
sure ๐ , will do it before end of this week. @wallyqs
If anybody ends up coming here after a Google search, here is my TL;DR
Your kubernetes service
should have this value set as its port
name: tcp
or tcp-
as prefix
If anybody ends up coming here after a Google search, here is my TL;DR
Your
kubernetes service
should have this value set as itsport
name:tcp
ortcp-
as prefix
Thanks!
From a first glance I was noticing that I wasn't getting a response over telnet (with istio-sidecar)
Little telnet debug no response
> telnet nats.nats-system.svc.cluster.local 4222
Telnet with a PING -> instant response
>telnet nats.nats-system.svc.cluster.local 4222 PING INFO {"server_id":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","server_name":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","version":"2.1.7","proto":1,"git_commit":"bf0930e","go":"go1.13.10","host":"0.0.0.0","port":4222,"max_payload":1048576,"client_id":1,"client_ip":"127.0.0.1"} PONG
Interesting istio thinks it's HTTP raw_buffer:
> istioctl pc listeners <pod-name> --port 4222 ADDRESS PORT MATCH DESTINATION 10.96.249.161 4222 Trans: raw_buffer; App: HTTP Route: nats.nats-system.svc.cluster.local:4222
Istio Explicit Port Selection helped me.
Here the service for nats doesn't declare tcp or tls. If they added appProtocol explicity for k8s 1.18+ or named the port tcp-client for example for tcp that would resolve it for Istio.
After renaming the port in service and on the pod spec:
istioctl pc listeners <pod-name> --port 4222 ADDRESS PORT MATCH DESTINATION 10.96.0.174 4222 ALL Cluster: outbound|4222||nats.nats-system.svc.cluster.local
Seems to have resolved my connectivity issues, but should be noted the same would need to be done for the other tcp ports.
Thanks a lot!!
hi, sorry for postmortem post, but is this fix needed for tcp/6222 server port as well?
thank you