GKE LoadBalancer doesn't work with service deployed by Skaffold
thesandlord opened this issue · 15 comments
I have a service with type: Loadbalancer
that I deploy with Skaffold. The service creates fine, the load balancer shows as healthy on the GCP console, but when I do kubectl get svc
the External IP address never gets resolved and is stuck in <pending>
. Everything works if I deploy same service using kubectl apply
.
I actually have this on video as well: https://youtu.be/JUFIF9QMN9M?t=1630
This has happened multiple times with multiple clusters, projects, and services. @ahmetb is experiencing the same issue as well.
Right now, I'm thinking there is something Skaffold does to the service (labels?) which is preventing the service from getting the external IP address.
Information
- Skaffold version: v.0.11.0
- Operating system: Linux
- Contents of skaffold.yaml:
Service YAML
apiVersion: v1
kind: Service
metadata:
name: uptimecheck
labels:
app: uptimecheck
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 3000
protocol: TCP
name: http
selector:
app: "uptimecheck"
Skaffold YAML
apiVersion: skaffold/v1alpha2
kind: Config
build:
artifacts:
- imageName: gcr.io/xxx/xxx
deploy:
kubectl:
manifests:
- svc.yaml
Steps to reproduce the behavior
skaffold dev
I am seeing the same.
Unless I use static IP, Service type=LoadBalancer never gets an IP on vanilla GKE cluster:
- if I go to Google Cloud Console, I see an IP for the LB
- but the IP is actually not associated with the LB on Kubernetes API
- overall, hitting the IP doesn't work even though it shows up on the UI
I know at least one more person who deployed the https://github.com/GoogleCloudPlatform/microservices-demo/ and reproed it. So that might be the easiest repro available in open source.
YAML:
apiVersion: v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"frontend-external","namespace":"default"},"spec":{"ports":[{"name":"http","port":80,"targetPort":8080}],"selector":{"app":"frontend"},"type":"LoadBalancer"}}
creationTimestamp: 2018-07-17T19:12:13Z
labels:
cleanup: "true"
deployed-with: skaffold
docker-api-version: "1.38"
skaffold-builder: local
skaffold-deployer: kubectl
skaffold-tag-policy: git-commit
name: frontend-external
namespace: default
resourceVersion: "4524845"
selfLink: /api/v1/namespaces/default/services/frontend-external
uid: 50092bab-89f5-11e8-a2bb-42010a80009c
spec:
clusterIP: 10.19.250.58
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 30751
port: 80
protocol: TCP
targetPort: 8080
selector:
app: frontend
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer: {}
describe output:
Name: frontend-external
Namespace: default
Labels: cleanup=true
deployed-with=skaffold
docker-api-version=1.38
skaffold-builder=local
skaffold-deployer=kubectl
skaffold-tag-policy=git-commit
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"frontend-external","namespace":"default"},"spec":{"ports":[{"name":"http","por...
Selector: app=frontend
Type: LoadBalancer
IP: 10.19.250.58
Port: http 80/TCP
TargetPort: 8080/TCP
NodePort: http 30751/TCP
Endpoints: 10.16.2.99:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
I got another person repro this too.
Progress debugging this: So if I do skaffold delete
wait 5 mins (so underlying GCE networking resources deleted) and redeploy with skaffold run
I can repro this 100%.
+Bonus: if I do kubectl get -o=yaml service/frontend-external | kubectl apply -f-
which causes a "re-apply", then it gets the EXTERNAL-IP right away.
AWESOME! Thanks @balopat .
I was seeing the last-applied-configuration
even on a clean skaffold run
which got me thinking whether skaffold is applying things twice.
Then I thought "I guess this annotation just exists when you deploy things with kubectl-apply". I shouldn't have thought that. Well at least now we know what to fix. 🥇
an update: we are thinking about how to get around the labelling issue, some of the crappy alternatives that came up are:
- don't label services at all (works, but not ideal as it's inconsistent)
- label loadbalancer services only after external ip is assigned (there might be other issues preventing)
- label loadbalancer services after a certain timeout (e.g. 2 minutes is mostly good for GKE)
- maybe 2 with a timeout and then 3 combined?
- look again deeper into the design of labelling and rethink it (needs more time)
I think ideally this should be fixed in Kubernetes core. The service controller should not be easily confused and get stuck. If you have a reliable repro, please open an issue to kubernetes/kubernetes.
I don't think this is Kubernetes core specific, this looks like a GKE LoadBalancer specific issue. I will open an issue with them though.
repro is super easy:
export app=mysvc; kubectl run $app --image nginx && kubectl expose deployment/$app --port 80 --type LoadBalancer && kubectl edit svc/$app
add a label in the edit command and you'll get the same issue
Kubernetes core specific
Service controller (+cloudprovder support) is in Kubernetes core (https://github.com/kubernetes/kubernetes/tree/master/pkg/controller/service and https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_loadbalancer_external.go), therefore I recommend opening a GitHub issue. (:
Just wanted to throw in a little +1 on this, experiencing the same issue
It's fixed in kubernetes/kubernetes#68087 it's currently not picked into any of the 1.12 releases.
Since this is in GKE master and GKE tends to pick up the new k8s versions through a long vetting process (i.e. today the default gke version is 1.9.7, and k8s just released 1.12.0-beta.1), it's unlikely that this will be fixed in GKE in the next 3 months.
It might be worth considering to patch this somehow in Skaffold for the short-term.
I believe this is fixed with #2568 - I'm not able to reproduce this locally on the latest version (v0.34.0). @thesandlord @ahmetb @balopat could one of you test out and make sure it's working for you as well?
confirmed, this should work now!