l7mp/stunner

turn ERROR: Failed to handle datagram: failed to handle CreatePermission-request from [ip]:37462: no allocation found [ip]:37462:[::]:3478

Closed this issue · 2 comments

Having difficulties running STUNner in headless mode on a single-node kURL environment. Given that it's running on a single node, there is no load balancer controller responding to request for new Service with Type: LoadBalancer. Hence, the Gateway is defined as ClusterIP and the UDP:3478 port gets exposed externally via Ingress NGINX. The nginx ingress controller is configured with hostNetwork: true listening on port 80, 433, and 3478.

Establishing connection to STUNner on UDP:3478 is working as expected, but it then stuck in an error loop of Failed to handle datagram with no allocation found.. any suggestion on how to further debug/fix the issue?

STUNner is configured as follow:

  • Gateway:
     apiVersion: gateway.networking.k8s.io/v1
     kind: Gateway
     metadata:
       name: stunner-gateway
       annotations:
         stunner.l7mp.io/service-type: ClusterIP
     spec:
       gatewayClassName: stunner-gatewayclass
       listeners:
       - name: udp-on-turn-listener
         port: 3478
         protocol: TURN-UDP	
  • UDPRoute:
     apiVersion: stunner.l7mp.io/v1
     kind: UDPRoute
     metadata:
       name: stunner-loopback
     spec:
       parentRefs:
       - name: stunner-gateway
       rules:
       - backendRefs:
         - name: stunner-loopback
           namespace: default
  • Service:
     apiVersion: v1
     kind: Service
     metadata:
       name: stunner-loopback
     spec:
       type: ClusterIP
       selector:
         app: stunner
       ports:
       - name: udp-on-turn
         port: 3478
         protocol: UDP
         targetPort: 3478

The server logs are:

15:10:06.910022 main.go:75: stunnerd INFO: starting stunnerd instance "default/stunner-gateway"
15:10:06.910115 main.go:105: stunnerd INFO: bootstrapping with minimal config
15:10:06.910129 main.go:109: stunnerd INFO: watching configuration at origin "http://10.32.0.70:13478"
15:10:06.910599 reconcile.go:112: stunner INFO: setting loglevel to "all:INFO"
15:10:06.910661 reconcile.go:140: stunner WARNING: running with no listeners
15:10:06.910673 reconcile.go:156: stunner WARNING: running with no clusters: all traffic will be dropped
15:10:06.910682 reconcile.go:176: stunner INFO: reconciliation ready: new objects: 2, changed objects: 0, deleted objects: 0, started objects: 0, restarted objects: 0
15:10:06.910692 reconcile.go:180: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: static, listeners: NONE, active allocations: 0
15:10:06.911844 cds_client.go:83: cds-client INFO: connection successfully opened to config discovery server at ws://10.32.0.70:13478/api/v1/configs/default/stunner-gateway?watch=true
15:10:06.913104 reconcile.go:112: stunner INFO: setting loglevel to "all:DEBUG"
15:10:06.913494 auth.go:56: stunner-auth DEBUG: using authentication: static
15:10:06.913517 reconcile.go:131: auth-manager DEBUG: reconciliation ready: to-be-created: 0, changed: 1, deleted: 0
15:10:06.913623 reconcile.go:131: listener-manager DEBUG: reconciliation ready: to-be-created: 1, changed: 0, deleted: 0
15:10:06.915273 reconcile.go:131: cluster-manager DEBUG: reconciliation ready: to-be-created: 1, changed: 0, deleted: 0
15:10:06.915320 server.go:28: stunner INFO: listener default/stunner-gateway/udp-on-turn-listener: [turn-udp://10.32.0.90:3478<0:0>] (re)starting
15:10:06.915334 server.go:45: stunner INFO: setting up UDP listener socket pool at 0.0.0.0:3478 with 16 readloop threads
15:10:06.915915 server.go:166: stunner INFO: listener default/stunner-gateway/udp-on-turn-listener: TURN server running
15:10:06.915937 reconcile.go:176: stunner INFO: reconciliation ready: new objects: 2, changed objects: 2, deleted objects: 0, started objects: 1, restarted objects: 0
15:10:06.915961 reconcile.go:180: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: static, listeners: default/stunner-gateway/udp-on-turn-listener: [turn-udp://10.32.0.90:3478<0:0>], active allocations: 0
15:12:09.005952 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:53688 on [::]:3478
15:12:09.006003 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.006023 stun.go:12: turn DEBUG: Received BindingRequest from 10.32.0.1:53688
15:12:09.006163 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:52114 on [::]:3478
15:12:09.006180 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.006188 stun.go:12: turn DEBUG: Received BindingRequest from 10.32.0.1:52114
15:12:09.056479 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:60373 on [::]:3478
15:12:09.056522 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.056538 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:60373
15:12:09.056876 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:52089 on [::]:3478
15:12:09.056929 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.056941 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:52089
15:12:09.072498 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:58237 on [::]:3478
15:12:09.072543 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.072557 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:58237
15:12:09.072616 handlers.go:25: stunner-auth INFO: static auth request: username="guest" realm="stunner.l7mp.io" srcAddr=10.32.0.1:58237
15:12:09.072647 handlers.go:30: stunner-auth DEBUG: static auth request: valid username
15:12:09.072873 allocation_manager.go:116: turn DEBUG: Listening on relay address: 10.32.0.90:49312
15:12:09.073417 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:34128 on [::]:3478
15:12:09.073482 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.073498 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:34128
15:12:09.073524 handlers.go:25: stunner-auth INFO: static auth request: username="guest" realm="stunner.l7mp.io" srcAddr=10.32.0.1:34128
15:12:09.073537 handlers.go:30: stunner-auth DEBUG: static auth request: valid username
15:12:09.073680 allocation_manager.go:116: turn DEBUG: Listening on relay address: 10.32.0.90:60503
15:12:09.293926 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:51530 on [::]:3478
15:12:09.293971 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.293996 stun.go:12: turn DEBUG: Received BindingRequest from 10.32.0.1:51530
15:12:09.324390 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:44682 on [::]:3478
15:12:09.324673 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.324707 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:44682
15:12:09.345078 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:47189 on [::]:3478
15:12:09.345351 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.345381 turn.go:18: turn DEBUG: Received AllocateRequest from 10.32.0.1:47189
15:12:09.345403 handlers.go:25: stunner-auth INFO: static auth request: username="guest" realm="stunner.l7mp.io" srcAddr=10.32.0.1:47189
15:12:09.346989 handlers.go:30: stunner-auth DEBUG: static auth request: valid username
15:12:09.347165 allocation_manager.go:116: turn DEBUG: Listening on relay address: 10.32.0.90:41840
15:12:09.365122 server.go:39: turn DEBUG: Received 124 bytes of udp from 10.32.0.1:37462 on [::]:3478
15:12:09.365214 server.go:64: turn DEBUG: Handling TURN packet
15:12:09.365230 turn.go:214: turn DEBUG: Received CreatePermission from 10.32.0.1:37462
15:12:09.365285 server.go:202: turn ERROR: Failed to handle datagram: failed to handle CreatePermission-request from 10.32.0.1:37462: no allocation found 10.32.0.1:37462:[::]:3478
15:12:09.365895 server.go:39: turn DEBUG: Received 124 bytes of udp from 10.32.0.1:41700 on [::]:3478
rg0now commented

I'm afraid this has to do something with the weird way in which NGINX forwards UDP packets. It seems stunnerd gets every single packet from a different source port:

15:12:09.005952 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:53688 on [::]:3478
15:12:09.006163 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:52114 on [::]:3478
15:12:09.056479 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:60373 on [::]:3478
15:12:09.056876 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:52089 on [::]:3478
15:12:09.072498 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:58237 on [::]:3478
15:12:09.073417 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:34128 on [::]:3478
15:12:09.293926 server.go:39: turn DEBUG: Received 20 bytes of udp from 10.32.0.1:51530 on [::]:3478
15:12:09.324390 server.go:39: turn DEBUG: Received 28 bytes of udp from 10.32.0.1:44682 on [::]:3478
15:12:09.345078 server.go:39: turn DEBUG: Received 120 bytes of udp from 10.32.0.1:47189 on [::]:3478
15:12:09.365122 server.go:39: turn DEBUG: Received 124 bytes of udp from 10.32.0.1:37462 on [::]:3478
15:12:09.365895 server.go:39: turn DEBUG: Received 124 bytes of udp from 10.32.0.1:41700 on [::]:3478

This makes me believe that NGINX somehow seems to think it has to create a new UDP proxy connection for every client packet. The first time stunnerd receives a TURN packet that assumes prior state (that's the CreatePermission request) it fails as there is no allocation for that source port (the corresponding CreateAllocation that would have created that state came from a different port from NGINX).

My advice would be to either fix NGINX (maybe it runs with an extra small UDP conntrack TTL?? seems improbable) or remove NGINX all together from the loop. For instance, you can deploy stunnerd into the host-network namespace and use the static TURN server URI turn:<node-public-IP>:3478. This may, however, create port clashes between NGINX and stunnerd, which is not a problem if stunnerd uses only UDP and NGINX TCP.

You can also choose to expose stunnerd on a NodePort by setting the stunner.l7mp.io/service-type: NodePort annotation on your Gateway (this would be my pick). Unfortunately, you cannot request a particular nodeport via the Gateway API so the public port will be dynamic, but then you can use STUNner's auth-service to generate a dynamic ICE server config for your clients.

Thanks for the pointer!

Running it with hostNetwork: true does solved the issue. So the "bug" is indeed on nginx.

Using NodePort would also work, but having the public port be dynamic would make configuring the network firewall harder (unpredictable).