k8sgpt-ai/k8sgpt-operator

[Question]: I have two questions regarding its usage. May I ask if they can be resolved?

Opened this issue · 1 comments

Checklist

  • I've searched for similar issues and couldn't find anything matching
  • I've included steps to reproduce the behavior

Affected Components

  • K8sGPT (CLI)
  • K8sGPT Operator

K8sGPT Version

v0.1.6

Kubernetes Version

v1.28.11

Host OS and its Version

Rocky Linux 8.10

Steps to reproduce

1. When the error persists, the results are occasionally empty.

# k get results -n monitoring  
NAME                                     KIND   BACKEND
defaultnginxdeployment26b7b6f9774b4wng   Pod    openai
# k get pod
NAME                                 READY   STATUS             RESTARTS   AGE
nginx-deployment2-6b7b6f9774-b4wng   0/1     ImagePullBackOff   0          2m21s
# k get results -n monitoring  
No resources found in monitoring namespace.
# k get results -n monitoring  
NAME                                     KIND   BACKEND
defaultnginxdeployment26b7b6f9774b4wng   Pod    openai

2. K8sGPT will print error logs, but it does not affect usage.

Created result defaultnginxdeployment56f9d4488hx589
Finished Reconciling k8sGPT
Creating new client for 10.108.197.164:8080
Connection established between 10.108.197.164:8080 and localhost with time out of 1 seconds.
Remote Address : 10.108.197.164:8080 
K8sGPT address: 10.108.197.164:8080
Checking if defaultnginxdeployment56f9d4488hx589 is still relevant
Finished Reconciling k8sGPT with error: Operation cannot be fulfilled on results.core.k8sgpt.ai "defaultnginxdeployment56f9d4488hx589": the object has been modified; please apply your changes to the latest version and try again
2024-07-16T19:49:06Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"monitoring"}, "namespace": "monitoring", "name": "k8sgpt-sample", "reconcileID": "2b6fe54e-750e-4731-a364-d689a2665448", "error": "Operation cannot be fulfilled on results.core.k8sgpt.ai \"defaultnginxdeployment56f9d4488hx589\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
Creating new client for 10.108.197.164:8080
Connection established between 10.108.197.164:8080 and localhost with time out of 1 seconds.
Remote Address : 10.108.197.164:8080 
K8sGPT address: 10.108.197.164:8080
Checking if defaultnginxdeployment56f9d4488hx589 is still relevant
Finished Reconciling k8sGPT

Expected behaviour

  1. Is it possible to stably display results when errors exist?
  2. How can I eliminate the error logs from K8sGPT?

Actual behaviour

No response

Additional Information

Configuration

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: monitoring
spec:
  ai:
    enabled: true
    model: gpt-3.5-turbo
    backend: openai
    baseUrl: https://api.chatanywhere.tech
    secret:
      name: k8sgpt-sample-secret
      key: openai-api-key
    language: chinese
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.8
EOF

Hey @yangy30 , thanks for raising this.

I will also try to reproduce as it seems that we can handle the lifecycle of the result object better.

By the looks of it, it seems that the operator is updating the result object with an old revision or object number and then the operation is getting retried successfully in the next reconciliation loop.

I am still unsure how the result spec can be empty if the operation is not successful though.

I am wondering if you see any issues in the k8sgpt pod logs. The k8sgpt pod will make the inference call to the your AI backend which if it fails it might get an empty response back and write it to the results object.