kubernetes-sigs/cloud-provider-kind

Load Balancer connection timeout on OrbStack (macOS 15.0)

tico88612 opened this issue · 10 comments

Environment

  • macOS 15.0 Sequoia
  • OrbStack 1.7.4 (17448)

Reproduce

Create kind cluster (successfully):

kind create cluster --name=cloud-provider-test --config=kind-cp-test.yaml

kind-cp-test.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

Start cloud-provider-kind (successfully):

sudo cloud-provider-kind

Install simple app (e.g. Nginx) with Load Balancer (successfully):

kubectl apply -f nginx.yaml

nginx.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: LoadBalancer
  ports:
    - port: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.17.3
          ports:
            - containerPort: 80
kubectl get svc -A
NAMESPACE     NAME         TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes   ClusterIP      10.96.0.1    <none>        443/TCP                  78m
default       nginx        LoadBalancer   10.96.48.2   172.17.0.6    80:30697/TCP             13m
kube-system   kube-dns     ClusterIP      10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   78m

Open Browser to check (timeout)

http://172.17.0.6

But the control plane (172.17.0.3:6443) can connect.

Please don't hesitate to contact me if you need more information.

Logs

I1003 10:26:39.118171   12762 app.go:46] FLAG: --enable-lb-port-mapping="false"
I1003 10:26:39.118186   12762 app.go:46] FLAG: --enable-log-dumping="false"
I1003 10:26:39.118188   12762 app.go:46] FLAG: --logs-dir=""
I1003 10:26:39.118191   12762 app.go:46] FLAG: --v="2"
I1003 10:26:39.245971   12762 controller.go:174] probe HTTP address https://cloud-provider-test-control-plane:6443
I1003 10:26:39.249767   12762 controller.go:177] Failed to connect to HTTP address https://cloud-provider-test-control-plane:6443: Get "https://cloud-provider-test-control-plane:6443": dial tcp: lookup cloud-provider-test-control-plane: no such host
I1003 10:26:39.249792   12762 controller.go:174] probe HTTP address https://cloud-provider-test-control-plane:6443
I1003 10:26:39.251740   12762 controller.go:177] Failed to connect to HTTP address https://cloud-provider-test-control-plane:6443: Get "https://cloud-provider-test-control-plane:6443": dial tcp: lookup cloud-provider-test-control-plane: no such host
I1003 10:26:40.252840   12762 controller.go:174] probe HTTP address https://cloud-provider-test-control-plane:6443
I1003 10:26:40.256895   12762 controller.go:177] Failed to connect to HTTP address https://cloud-provider-test-control-plane:6443: Get "https://cloud-provider-test-control-plane:6443": dial tcp: lookup cloud-provider-test-control-plane: no such host
I1003 10:26:42.257436   12762 controller.go:174] probe HTTP address https://cloud-provider-test-control-plane:6443
I1003 10:26:42.261465   12762 controller.go:177] Failed to connect to HTTP address https://cloud-provider-test-control-plane:6443: Get "https://cloud-provider-test-control-plane:6443": dial tcp: lookup cloud-provider-test-control-plane: no such host
I1003 10:26:45.262312   12762 controller.go:174] probe HTTP address https://cloud-provider-test-control-plane:6443
I1003 10:26:45.266489   12762 controller.go:177] Failed to connect to HTTP address https://cloud-provider-test-control-plane:6443: Get "https://cloud-provider-test-control-plane:6443": dial tcp: lookup cloud-provider-test-control-plane: no such host
E1003 10:26:49.267704   12762 controller.go:151] Failed to connect to apiserver cloud-provider-test: <nil>
I1003 10:26:49.547279   12762 controller.go:174] probe HTTP address https://127.0.0.1:52893
I1003 10:26:49.553605   12762 controller.go:84] Creating new cloud provider for cluster cloud-provider-test
I1003 10:26:49.565547   12762 controller.go:91] Starting cloud controller for cluster cloud-provider-test
I1003 10:26:49.565572   12762 controller.go:235] Starting service controller
I1003 10:26:49.565574   12762 node_controller.go:176] Sending events to api server.
I1003 10:26:49.565584   12762 shared_informer.go:313] Waiting for caches to sync for service
I1003 10:26:49.565601   12762 node_controller.go:185] Waiting for informer caches to sync
I1003 10:26:49.565611   12762 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I1003 10:26:49.565633   12762 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I1003 10:26:49.568629   12762 reflector.go:368] Caches populated for *v1.Service from pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243
I1003 10:26:49.568789   12762 reflector.go:368] Caches populated for *v1.Node from pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243
I1003 10:26:49.666755   12762 shared_informer.go:320] Caches are synced for service
I1003 10:26:49.666814   12762 instances.go:47] Check instance metadata for cloud-provider-test-control-plane
I1003 10:26:49.666828   12762 controller.go:737] Syncing backends for all LB services.
I1003 10:26:49.666830   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker3
I1003 10:26:49.666875   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker2
I1003 10:26:49.666883   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker
I1003 10:26:49.666836   12762 controller.go:741] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I1003 10:26:49.666937   12762 controller.go:737] Syncing backends for all LB services.
I1003 10:26:49.666955   12762 controller.go:741] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I1003 10:26:49.666962   12762 controller.go:737] Syncing backends for all LB services.
I1003 10:26:49.666966   12762 controller.go:741] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I1003 10:26:49.666969   12762 controller.go:737] Syncing backends for all LB services.
I1003 10:26:49.666973   12762 controller.go:741] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I1003 10:26:49.703384   12762 instances.go:75] instance metadata for cloud-provider-test-worker2: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker2", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker2"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.5"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::5"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:26:49.703383   12762 instances.go:75] instance metadata for cloud-provider-test-control-plane: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-control-plane", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-control-plane"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.3"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::3"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:26:49.703396   12762 instances.go:75] instance metadata for cloud-provider-test-worker3: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker3", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker3"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.2"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::2"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:26:49.703387   12762 instances.go:75] instance metadata for cloud-provider-test-worker: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.4"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::4"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:26:49.718931   12762 node_controller.go:271] Update 4 nodes status took 52.178083ms.
I1003 10:26:56.767000   12762 controller.go:402] Ensuring load balancer for service default/nginx
I1003 10:26:56.767031   12762 controller.go:958] Adding finalizer to service default/nginx
I1003 10:26:56.767087   12762 event.go:389] "Event occurred" object="default/nginx" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1003 10:26:56.786032   12762 loadbalancer.go:28] Ensure LoadBalancer cluster: cloud-provider-test service: nginx
I1003 10:26:56.802042   12762 server.go:95] container kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW for loadbalancer is not running
I1003 10:26:56.840544   12762 server.go:104] creating container for loadbalancer
I1003 10:26:56.840633   12762 server.go:249] creating loadbalancer with parameters: [--detach --tty --label io.x-k8s.cloud-provider-kind.cluster=cloud-provider-test --label io.x-k8s.cloud-provider-kind.loadbalancer.name=cloud-provider-test/default/nginx --net kind --init=false --hostname kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW --privileged --restart=on-failure --sysctl=net.ipv4.ip_forward=1 --sysctl=net.ipv4.conf.all.rp_filter=0 --publish=80/TCP --publish=10000/TCP --publish-all docker.io/envoyproxy/envoy:v1.30.1 bash -c echo -en 'node:
  cluster: cloud-provider-kind
  id: cloud-provider-kind-id

dynamic_resources:
  cds_config:
    resource_api_version: V3
    path: /home/envoy/cds.yaml
  lds_config:
    resource_api_version: V3
    path: /home/envoy/lds.yaml

admin:
  access_log_path: /dev/stdout
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 10000
' > /home/envoy/envoy.yaml && touch /home/envoy/cds.yaml && touch /home/envoy/lds.yaml && while true; do envoy -c /home/envoy/envoy.yaml && break; sleep 1; done]
I1003 10:26:56.992386   12762 server.go:112] updating loadbalancer
I1003 10:26:56.992446   12762 proxy.go:239] address type Hostname, only InternalIP supported
I1003 10:26:56.992454   12762 proxy.go:239] address type Hostname, only InternalIP supported
I1003 10:26:56.992457   12762 proxy.go:239] address type Hostname, only InternalIP supported
I1003 10:26:56.992465   12762 proxy.go:239] address type Hostname, only InternalIP supported
I1003 10:26:56.992474   12762 proxy.go:274] envoy config info: &{HealthCheckPort:10256 ServicePorts:map[IPv4_80_TCP:{Listener:{Address:0.0.0.0 Port:80 Protocol:TCP} Cluster:[{Address:172.17.0.2 Port:30697 Protocol:TCP} {Address:172.17.0.3 Port:30697 Protocol:TCP} {Address:172.17.0.4 Port:30697 Protocol:TCP} {Address:172.17.0.5 Port:30697 Protocol:TCP}]}] SessionAffinity:None SourceRanges:[]}
I1003 10:26:56.992631   12762 proxy.go:292] updating loadbalancer with config
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: listener_IPv4_80_TCP
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 80
      protocol: TCP
  filter_chains:
  - filters:
    - name: envoy.filters.network.tcp_proxy
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        access_log:
        - name: envoy.file_access_log
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
        stat_prefix: tcp_proxy
        cluster: cluster_IPv4_80_TCP
I1003 10:26:57.014812   12762 proxy.go:303] updating loadbalancer with config
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
  name: cluster_IPv4_80_TCP
  connect_timeout: 5s
  type: STATIC
  lb_policy: RANDOM
  health_checks:
  - timeout: 5s
    interval: 3s
    unhealthy_threshold: 2
    healthy_threshold: 1
    no_traffic_interval: 5s
    always_log_health_check_failures: true
    always_log_health_check_success: true
    event_log_path: /dev/stdout
    http_health_check:
      path: /healthz
  load_assignment:
    cluster_name: cluster_IPv4_80_TCP
    endpoints:
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 172.17.0.2
                port_value: 30697
                protocol: TCP
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 172.17.0.3
                port_value: 30697
                protocol: TCP
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 172.17.0.4
                port_value: 30697
                protocol: TCP
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 172.17.0.5
                port_value: 30697
                protocol: TCP
I1003 10:26:57.071074   12762 server.go:120] updating loadbalancer tunnels on userspace
I1003 10:26:57.082300   12762 tunnel.go:34] found port maps map[10000:32785 80:32784] associated to container kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW
I1003 10:26:57.093051   12762 tunnel.go:41] setting IPv4 address 172.17.0.6 associated to container kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW
I1003 10:26:57.095229   12762 tunnel.go:112] Starting tunnel on 172.17.0.6:10000
I1003 10:26:57.095335   12762 tunnel.go:112] Starting tunnel on 172.17.0.6:80
I1003 10:26:57.095372   12762 server.go:128] get loadbalancer status
I1003 10:26:57.106146   12762 controller.go:999] Patching status for service default/nginx
I1003 10:26:57.106225   12762 event.go:389] "Event occurred" object="default/nginx" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
I1003 10:27:19.720289   12762 instances.go:47] Check instance metadata for cloud-provider-test-control-plane
I1003 10:27:19.720280   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker
I1003 10:27:19.720393   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker2
I1003 10:27:19.720280   12762 instances.go:47] Check instance metadata for cloud-provider-test-worker3
I1003 10:27:19.779453   12762 instances.go:75] instance metadata for cloud-provider-test-worker: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.4"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::4"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:27:19.779497   12762 instances.go:75] instance metadata for cloud-provider-test-worker2: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker2", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker2"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.5"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::5"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:27:19.779463   12762 instances.go:75] instance metadata for cloud-provider-test-worker3: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-worker3", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-worker3"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.2"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::2"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:27:19.779547   12762 instances.go:75] instance metadata for cloud-provider-test-control-plane: &cloudprovider.InstanceMetadata{ProviderID:"kind://cloud-provider-test/kind/cloud-provider-test-control-plane", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"cloud-provider-test-control-plane"}, v1.NodeAddress{Type:"InternalIP", Address:"172.17.0.3"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::3"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I1003 10:27:19.802847   12762 node_controller.go:271] Update 4 nodes status took 82.82725ms.

not familiar with OrbStack, can you check the portmapping works correctly?

I1003 10:26:57.082300 12762 tunnel.go:34] found port maps map[10000:32785 80:32784] associated to container kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW

curl localhost:32784 should be equivalent to curl 172.17.0.6:80

Unfortunately, it said ERR_CONNECTION_RESET.

can you test without kind, with other container, that the port forwarding functionality works?

https://docs.orbstack.dev/docker/network#port-forwarding

Yes, I have tested the docker-compose nginx, and it works.

version: '3'
services:
  nginx:
    image: nginx:latest
    ports:
      - "29999:80"
image

Ok. so docker ps should give you the containers running and its portmaps

docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED       STATUS        PORTS                                           NAMES
2d6f9816bce7   envoyproxy/envoy:v1.30.1        "/docker-entrypoint.…"   4 days ago    Up 2 days     0.0.0.0:49153->10000/tcp, :::49153->10000/tcp   kindccm-PFB765GEO5MNW3Z3N7AZB7XVPFXJDHY7LSKJDIJK
a6b4257d4d66   moby/buildkit:buildx-stable-1   "buildkitd"              13 days ago   Up 46 hours                                                   buildx_buildkit_knp-builder0
e7be87fccc30   kindest/node:v1.31.0            "/usr/local/bin/entr…"   13 days ago   Up 2 days                                                     kind-worker
228ca43b3a7b   kindest/node:v1.31.0            "/usr/local/bin/entr…"   13 days ago   Up 2 days                                                     kind-worker2
10952219a467   kindest/node:v1.31.0            "/usr/local/bin/entr…"   13 days ago   Up 2 days     127.0.0.1:40639->6443/tcp                       kind-control-plane

You tried to connect directly to the loadbalancer and didn't work, so you can inspect the kindccm-.... container logs to see why the packet is not forwarded

https://gist.github.com/tico88612/4afae6cbaf899a9ccaab276d4da7cfc1

I looked at these logs, and the rest are healthy (although I rebooted several times, I did confirm that svc's node port is working correctly)

CONTAINER ID   IMAGE                                                                                    COMMAND                  CREATED         STATUS                      PORTS                                                                                    NAMES
3db646a43aa1   envoyproxy/envoy:v1.30.1                                                                 "/docker-entrypoint.…"   6 minutes ago   Up 6 minutes                0.0.0.0:32798->80/tcp, :::32798->80/tcp, 0.0.0.0:32799->10000/tcp, :::32799->10000/tcp   kindccm-AZ3RL2ZGY2KITRSZTRI5E6KAXYMFVJLTR72M5IJW
e2cd3ba33d99   kindest/node:v1.31.0                                                                     "/usr/local/bin/entr…"   12 hours ago    Up 12 hours                                                                                                          cloud-provider-test-worker2
2fe311e10501   kindest/node:v1.31.0                                                                     "/usr/local/bin/entr…"   12 hours ago    Up 12 hours                                                                                                          cloud-provider-test-worker3
e3b7c32bdd63   kindest/node:v1.31.0                                                                     "/usr/local/bin/entr…"   12 hours ago    Up 12 hours                                                                                                          cloud-provider-test-worker
779fa4a60361   kindest/node:v1.31.0                                                                     "/usr/local/bin/entr…"   12 hours ago    Up 12 hours                 127.0.0.1:52893->6443/tcp                                                                cloud-provider-test-control-plane

e.g. 172.17.0.2:30910, 172.17.0.3:30910, 172.17.0.4:30910, 172.17.0.5:30910 are connected to nginx properly.

can you exec into the envoy container and verify the connectivity

  1. from the host to the container
  2. from the container to the kind node and nodeport

You can install curl and netcat on the envoy container to validate those things, is an ubunut image as based IIRC

I'm trying to debug, and I've observed something strange about creating a bridge network on OrbStack. (orbstack/orbstack#1496)

  1. From the host to the envoyproxy container, It seems failed.
  2. From the envoyproxy container to kind node and nodeport, it is working correctly.

I'll keep an eye on this part of the question and report back if I have any questions. Thanks for your help!

Hi @aojea, thank you for following up on this issue; I appreciate it!

According to OrbStack architecture, this also uses VM.

Based on README.md has written:

Mac and Windows run the containers inside a VM and, on the contrary to Linux, the KIND nodes are not reachable from the host, so the LoadBalancer assigned IP is not working for users.
To solve this problem, cloud-provider-kind, leverages the existing docker portmap capabilities to expose the Loadbalancer IP and Ports on the host.

After testing OrbStack with myself and other community members, the following external IP connections are working:

sudo cloud-provider-kind --enable-lb-port-mapping=true

If this is expected behavior, you can close the issue. Thank you!

aojea commented

It seems the orbstack architecture may be a bit different https://docs.orbstack.dev/architecture#network , if it works for you or if nobody wants to spend some time to analyze better the problem, then yes, we can close