[Question]: k8sGPT Operator with Gemini Backend not working as expected

Question

[Question]: k8sGPT Operator with Gemini Backend not working as expected

ibakshay opened this issue 2 months ago · 0 comments

ibakshay commented 2 months ago

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.41

Kubernetes Version

v1.30.0

Host OS and its Version

No response

Steps to reproduce

Have a Kubernetes cluster running the k8sGPT Operator.

Apply the following k8sGPT manifest with google as the backend:

Name:         k8sgpt-sample
Namespace:    k8sgpt-operator-system
Labels:       <none>
Annotations:  <none>
API Version:  core.k8sgpt.ai/v1alpha1
Kind:         K8sGPT
Metadata:
  Creation Timestamp:  2024-10-12T23:08:41Z
  Finalizers:
    k8sgpt.ai/finalizer
  Generation:        3
  Resource Version:  20685
  UID:               164d7e80-c6df-4e09-8836-c1deabc850af
Spec:
  Ai:
    Anonymized:  true
    Back Off:
      Enabled:      true
      Max Retries:  5
    Backend:        google
    Enabled:        true
    Language:       english
    Model:          gemini-pro
    Secret:
      Key:     google-api-key
      Name:    k8sgpt-sample-secret
  Repository:  ghcr.io/k8sgpt-ai/k8sgpt
  Version:     v0.3.41
Events:        <none>

Create a Kubernetes secret with your Google API key:

kubectl create secret generic k8sgpt-sample-secret --from-literal=google-api-key=<Google API Key> -n k8sgpt-operator-system

Expected behaviour

The spec.details field in the generated Result manifest should contain information about the error, not an empty string. Example:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  creationTimestamp: "2024-10-13T19:34:35Z"
  generation: 1
  labels:
    k8sgpts.k8sgpt.ai/backend: google
    k8sgpts.k8sgpt.ai/name: k8sgpt-sample
    k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
  name: defaultnginxdeployment667cff5d68j7vtv
  namespace: k8sgpt-operator-system
  resourceVersion: "1980"
  uid: aad78f92-2473-4521-8169-fcd04279e2db
spec:
  backend: google
  details: "<Error details should be here>"
  error:
  - text: the last termination reason is Error container=nginx pod=nginx-deployment-667cff5d68-j7vtv
  kind: Pod
  name: default/nginx-deployment-667cff5d68-j7vtv
  parentObject: ""
status: {}

Actual behaviour

The k8s-sample deployment and its pod are created successfully, and k8sGPT is running without issues.
The Result manifests are also created, but the spec.details field is empty. Here’s an example manifest:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  creationTimestamp: "2024-10-13T19:34:35Z"
  generation: 1
  labels:
    k8sgpts.k8sgpt.ai/backend: google
    k8sgpts.k8sgpt.ai/name: k8sgpt-sample
    k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
  name: defaultnginxdeployment667cff5d68j7vtv
  namespace: k8sgpt-operator-system
  resourceVersion: "1980"
  uid: aad78f92-2473-4521-8169-fcd04279e2db
spec:
  backend: google
  details: ""
  error:
  - text: the last termination reason is Error container=nginx pod=nginx-deployment-667cff5d68-j7vtv
  kind: Pod
  name: default/nginx-deployment-667cff5d68-j7vtv
  parentObject: ""
status: {}

Additional Information

I checked the logs of the k8s-sample pod and found the following error:

Request failed. Failed while calling AI provider Google: googleapi: Error 400: * GenerateContentRequest.generation_config.max_output_tokens: max_output_tokens must be positive.

k8sgpt-sample-7c58bdb9d6-b5j2n {"level":"info","ts":1728763974.447422,"caller":"server/server.go:146","msg":"binding metrics to 8081"}
k8sgpt-sample-7c58bdb9d6-b5j2n {"level":"info","ts":1728763974.4475062,"caller":"server/server.go:105","msg":"binding api to 8080"}
k8sgpt-sample-7c58bdb9d6-b5j2n {"level":"info","ts":1728763986.0204685,"caller":"server/log.go:50","msg":"request failed. failed while calling AI provider google: googleapi: Error 400: * GenerateContentRequest.generation_config.max_output_tokens: max_output_tokens must be positive.","duration_ms":4564,"method":"/schema.v1.ServerAnalyzerService/Analyze","request":"backend:\"google\" explain:true anonymize:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.27:58998","status_code":2}

Upon investigation, I discovered that maxOutputTokens is not being set in the K8sGPT manifest, which causes the c.maxTokens here in k8sGPT to default to 0, leading to the error.

Proposal:
Add an optional spec.ai.maxOutputToken field with a default value on kind: K8sGPT, which will then be passed to the schemav1.AnalyzeRequest.
If this proposal is acceptable, I would be happy to submit a PR to implement this change.