l7mp/stunner

Example app udp-greeter.yaml not working - help needed

Closed this issue · 10 comments

Hi,
I am tying to setup the example to make sure this solution can be used for my scenario and looks like all the pods/services are running fine. But when i test the example to make sure everything works fine, getting below error... I am running this example on AWS EKS cluster. any ideas what could be wrong??

./turncat - k8s://stunner/udp-gateway:udp-listener udp://${PEER_IP}:9001
12:08:58.342349 turncat.go:570: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for F0PK7+FzGLlJGP6B

NAME                                                               READY   STATUS    RESTARTS   AGE
pod/stunner-auth-5c488547b-96755                                   1/1     Running   0          3d23h
pod/stunner-gateway-operator-controller-manager-79448cb5f5-p9x9f   2/2     Running   0          3d23h
pod/udp-gateway-7bd49f95d9-k8l6d                                   1/1     Running   0          3d3h

NAME                                                                  TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)          AGE
service/stunner-auth                                                  ClusterIP      x.x.x.x     <none>                                                                         8088/TCP         3d23h
service/stunner-config-discovery                                      ClusterIP      x.x.x.x   <none>                                                                         13478/TCP        3d23h
service/stunner-gateway-operator-controller-manager-metrics-service   ClusterIP      x.x.x.x   <none>                                                                         8443/TCP         3d23h
service/udp-gateway                                                   LoadBalancer   x.x.x.x   *********   3478:32616/UDP   3d3h

Can you please provide more info? We'd need at least the output from kubectl get gateways,gatewayconfigs,gatewayclasses,udproutes.stunner.l7mp.io --all-namespaces -o yaml, plus the logs from the operator and one of the stunnerd pods (if running) and anything you think important for tracking this down.

Hi @rg0now, Thanks for reply. Attached some information you requested. i don't see any stunnerd pods running...
stunner-udp-gateway-log.txt
stunner-gateway-operator-logs.txt
stunnerconfig.txt

I can't see any apparent problem with your setup. Can you please elevate the loglevel on the gateway so that we see why the connection hangs and rerun the test? Here is a simple way to set the maximum loglevel:

kubectl -n stunner patch gatewayconfig stunner-gatewayconfig --type=merge -p '{"spec": {"logLevel": "all:TRACE"}}'

i updated the log level in stunner-gatewayconfig to trace and stunner-gateway-operator-controller-manager to debug before getting this logs... please check if helps
stunner-gateway-operator-controller.txt

one thing i noticed is that LoadBalancer (Network LB) service exposes only UDP . should it expose TCP as well?. If so, how to do that..
service/udp-gateway LoadBalancer x.x.x.x ********* 3478:32616/UDP 3d3h

Unfortunately I'm no expert in AWS load-balancers, but you may be on the right track here: last time we looked at it AWS required a TCP health-checker to accept a UDP LoadBalancer. Can you experiment with the following annotations added to the Gateway?

stunner.l7mp.io/enable-mixed-protocol-lb: true
service.beta.kubernetes.io/aws-loadbalancer-healthcheck-port: "8086"
service.beta.kubernetes.io/aws-loadbalancer-healthcheck-protocol: "http"
service.beta.kubernetes.io/aws-loadbalancer-healthcheck-path: "/live" 

What is strange is that the stunner-udp-gateway-log.txt actually shows a successful authentication attempt from someone (please check the source IP: is that one of your pods or it's coming from the outside?). Can you resend the stunner-udp-gateway-log.txt, but this time with the elevated loglevel?

I am also seeing this problem:

turncat -v - k8s://stunner/udp-gateway:udp-listener udp://${PEER_IP}:9001:

08:41:32.460190 main.go:81: turncat-cli DEBUG: Reading STUNner config from URI "k8s://stunner/udp-gateway:udp-listener"
08:41:32.460296 main.go:163: turncat-cli DEBUG: Searching for CDS server
08:41:32.460312 k8s_client.go:154: cds-fwd DEBUG: Obtaining kubeconfig
08:41:32.461017 k8s_client.go:161: cds-fwd DEBUG: Creating a Kubernetes client
08:41:32.461312 k8s_client.go:196: cds-fwd DEBUG: Querying CDS server pods in namespace "<all>" using label-selector "stunner.l7mp.io/config-discovery-service=enabled"
08:41:32.488454 k8s_client.go:367: cds-fwd DEBUG: Found pod: stunner-system/stunner-gateway-operator-controller-manager-foo-bar
08:41:32.488604 k8s_client.go:376: cds-fwd DEBUG: Creating a SPDY stream to API server using URL "https://10.0.1.4:16443/api/v1/namespaces/stunner-system/pods/stunner-gateway-operator-controller-manager-foo-bar/portforward"
08:41:32.488725 k8s_client.go:384: cds-fwd DEBUG: Creating a port-forwarder to pod
08:41:32.488771 k8s_client.go:400: cds-fwd DEBUG: Waiting for port-forwarder...
08:41:32.516363 k8s_client.go:419: cds-fwd DEBUG: Port-forwarder connected to pod stunner-system/stunner-gateway-operator-controller-manager-foo-bar at 127.0.0.1:37641
08:41:32.516420 cds_api.go:215: cds-client DEBUG: GET: loading config for gateway stunner/udp-gateway from CDS server 127.0.0.1:37641
08:41:32.527517 main.go:88: turncat-cli DEBUG: Generating STUNner authentication client
08:41:32.527574 main.go:95: turncat-cli DEBUG: Generating STUNner URI
08:41:32.527591 main.go:102: turncat-cli DEBUG: Starting turncat with STUNner URI: turn://8.0.0.8:3478?transport=udp
08:41:32.527637 turncat.go:186: turncat INFO: Turncat client listening on file://stdin, TURN server: turn://8.0.0.8:3478?transport=udp, peer: udp:10.152.183.128:9001
08:41:32.527653 main.go:118: turncat-cli DEBUG: Entering main loop
08:41:32.527739 turncat.go:227: turncat DEBUG: new connection from client /dev/stdin
08:41:32.535533 client.go:110: turnc DEBUG: Resolved STUN server 8.0.0.8:3478 to 8.0.0.8:3478
08:41:32.535563 client.go:119: turnc DEBUG: Resolved TURN server 8.0.0.8:3478 to 8.0.0.8:3478

Can you please elevate the loglevel on the gateway so that we see why the connection hangs and rerun the test?

Just to make it clear: after elevating the stunnerd loglevel to all:TRACE, please repeat the turncat test and post the logs from the stunnerd pod, and not from the operator. The below would do it for the current setup:

kubectl -n stunner logs $(kubectl -n stunner get pod -l app=stunner -o jsonpath='{.items[0].metadata.name}')

This is because we need to see whether the connection request from turncat has made it to stunnerd (if not, then this is a LB issue), and if it did, then what happened to the connection after authentication. The last line of log we see above is this:

05:41:38.565349 handlers.go:25: stunner-auth INFO: static auth request: username="user-1" realm="stunner.l7mp.io" srcAddr=X.X.X.81:39986

We need to see what happened afterwards in the dataplane. Frankly, the whole thing is quite mysterious: if the authentication request were not successful then we would see that clearly in the logs, but if it was (like in our case), then why the client did not continue with establishing the connection? That's what the trace level logs would reveal (I hope).

One minor silly thing: after running turncat, try to send something and press Enter, because turncat waits on the standard input for data to be sent to the greeter. I guess you know that anyway, just to be absolutely sure.

Can you please elevate the loglevel on the gateway so that we see why the connection hangs and rerun the test?

Just to make it clear: after elevating the stunnerd loglevel to all:TRACE, please repeat the turncat test and post the logs from the stunnerd pod, and not from the operator. The below would do it for the current setup:

kubectl -n stunner logs $(kubectl -n stunner get pod -l app=stunner -o jsonpath='{.items[0].metadata.name}')

This is because we need to see whether the connection request from turncat has made it to stunnerd (if not, then this is a LB issue), and if it did, then what happened to the connection after authentication. The last line of log we see above is this:

05:41:38.565349 handlers.go:25: stunner-auth INFO: static auth request: username="user-1" realm="stunner.l7mp.io" srcAddr=X.X.X.81:39986

We need to see what happened afterwards in the dataplane. Frankly, the whole thing is quite mysterious: if the authentication request were not successful then we would see that clearly in the logs, but if it was (like in our case), then why the client did not continue with establishing the connection? That's what the trace level logs would reveal (I hope).

One minor silly thing: after running turncat, try to send something and press Enter, because turncat waits on the standard input for data to be sent to the greeter. I guess you know that anyway, just to be absolutely sure.

Does turncat automatically add in credentials from the deployment, or do we have to add them in the udp connection string?

Theoretically, it should. It actually asks the operator for the running config of the gateway corresponding to the k8s:// URI so it should see up-to-date settings. It even generates its own ephemeral auth credential if that's what you've set. So try turncat as above (without the auth credentials) and if you get an authentication error then that's a bug.

Closing this for now, feel free to reopen of anything new comes up.