l7mp/stunner

Gatteway API v1.0 incompatibility on GKE

raviprakash007 opened this issue · 6 comments

I have a Google Cloud K8 cluster. Where I tried to install Stunner using Helm. The process I followed is as follows:

helm repo add stunner https://l7mp.io/stunner
helm repo update
helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace --namespace=stunner-system

Output:

W0218 10:58:13.969185    1641 warnings.go:70] autopilot-default-resources-mutator: Autopilot updated Deployment stunner-system/stunner-auth: adjusted resources to meet requirements for containers [stunner-auth-server] (see http://g.co/gke/autopilot-resources)
W0218 10:58:14.428142    1641 warnings.go:70] autopilot-default-resources-mutator:Autopilot updated Deployment stunner-system/stunner-gateway-operator-controller-manager: adjusted resources to meet requirements for containers [kube-rbac-proxy, manager] (see http://g.co/gke/autopilot-resources)
NAME: stunner-gateway-operator
LAST DEPLOYED: Sun Feb 18 10:57:49 2024
NAMESPACE: stunner-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Stunner Gateway operator has been successfully installed.

In order to serve the essential manifests apply the following resources.
You can modify them in order to make them serve your needs.

You must serve some resources to the operator in order to make it work as intended.
You can find everything here:
https://github.com/l7mp/stunner/blob/main/README.md#configuration

To clean up: 
helm uninstall stunner-gateway-operator -n stunner-system

Now, in k8, I checked the services pods are up and working.

Services:
image

ConfigMap:
image

image

Workloads:
image

And if I check the workload logs, it says:

2024-02-18T10:59:07.595467218Z	INFO	dataplane-controller	watching dataplane objects
2024-02-18T10:59:07.595535955Z	INFO	gateway-controller	created gateway controller
2024-02-18T10:59:07.595549362Z	INFO	gateway-controller	watching gatewayclass objects
2024-02-18T10:59:07.595560428Z	INFO	gateway-controller	watching gateway objects
2024-02-18T10:59:07.599909756Z	ERROR	setup	problem running operator	{"error": "cannot register gateway controller: failed to get API group resources: unable to retrieve the complete list of server APIs: gateway.networking.k8s.io/v1: the server could not find the requested resource"}
2024-02-18T10:59:51.575685073Z	INFO	setup	endpoint discovery	{"state": true}
2024-02-18T10:59:51.57658419Z	INFO	setup	dataplane mode	{"mode": "managed"}
2024-02-18T10:59:51.57665002Z	INFO	setup	config discovery server	{"addr": "10.0.129.93:13478"}
2024-02-18T10:59:51.576662337Z	INFO	setup	setting up Kubernetes controller manager
2024-02-18T10:59:51.586293145Z	INFO	setup	setting up STUNner config renderer
2024-02-18T10:59:51.586600642Z	INFO	setup	setting up updater client
2024-02-18T10:59:07.579889325Z	INFO	cds-server	Starting CDS server	{"address": "10.0.129.93:13478"}
2024-02-18T10:59:07.579994106Z	INFO	gatewayconfig-controller	created gatewayconfig controller
2024-02-18T10:59:07.580230758Z	INFO	gatewayconfig-controller	watching gatewayconfig objects
2024-02-18T10:59:07.59535982Z	INFO	gatewayconfig-controller	watching secret objects
2024-02-18T10:59:07.595451622Z	INFO	dataplane-controller	created dataplane controller
2024-02-18T10:59:07.595467218Z	INFO	dataplane-controller	watching dataplane objects
2024-02-18T10:59:07.595535955Z	INFO	gateway-controller	created gateway controller
2024-02-18T10:59:07.595549362Z	INFO	gateway-controller	watching gatewayclass objects
2024-02-18T10:59:07.595560428Z	INFO	gateway-controller	watching gateway objects
2024-02-18T10:59:07.599909756Z	ERROR	setup	problem running operator	{"error": "cannot register gateway controller: failed to get API group resources: unable to retrieve the complete list of server APIs: gateway.networking.k8s.io/v1: the server could not find the requested resource"}
2024-02-18T10:59:51.575685073Z	INFO	setup	endpoint discovery	{"state": true}

What is the problem and how can I expose the Stunner to external world?

This is a GKE issue first reported here: envoyproxy/gateway#2301

Some possible workarounds:

  • install an older STUNner version that is still on Gateway API v.0.8.0
  • install a GKE "standard mode" cluster and disable Gateway API auto-install
  • wait for Google to fix this on their side (should happen relatively soon).

manager: l7mp/stunner-gateway-operator:0.16.0 , working.

Closing this then. Feel free to reopen is something related comes up. Meanwhile, fingers crossed for Google advancing the Gateway API version to v1 soon.

Would it make sense to link this github issue in addition to logging setup problem running operator {"error": "cannot register gateway controller: failed to get API group resources: unable to retrieve the complete list of server APIs: gateway.networking.k8s.io/v1: the server could not find the requested resource"}?

Cool idea. The problem is surfaced by the K8s controller runtime here. We would need to carefully unwrap the error using some advanced errors.Is magic and direct the user here if there's a match for this specific error. I'll try to implement this once I find some spare cycles.