Incorrect error logging caused by ResourceVersion conflict on SDK Patch call
daniellee opened this issue · 2 comments
What happened:
After upgrading to 1.41, we started getting a couple of new error messages in the logs for the Agones sidecar:
{"message":"the server rejected our request due to an error in our request","severity":"error","time":"2024-08-27T05:11:38.171542173Z"}
and:
{"error":"the server rejected our request due to an error in our request","gsKey":"updateLabel","message":"","queue":"agones.dev.gameservers.xxx","severity":"error","source":"*sdkserver.SDKServer","time":"2024-08-27T00:33:16.155807016Z"}
and:
{"message":"could not update GameServer gameservers/xxx to state Shutdown: the server rejected our request due to an error in our request","severity":"error","stack":["agones.dev/agones/pkg/sdkserver.(*SDKServer).updateState\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:412","agones.dev/agones/pkg/sdkserver.(*SDKServer).syncGameServer\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:331","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:182","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:158","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run.func1\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226","k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"],"time":"2024-08-26T10:06:47.046294688Z"}
It is due to the change in #3803 that adds a test Json Patch operation that checks the ResourceVersion and rejects patch updates if there is a conflict. If two calls are made to update a field in the gameserver spec (like setting a label/annotation or changing the state) in quick succession then a conflict can happen.
After the failed patch update, it gets retried and then succeeds - so I don't think this is actually an error and is just how Kubernetes works.
What you expected to happen:
For this error message to not be an error or to be debug logging.
How to reproduce it (as minimally and precisely as possible):
The error message is from Kubernetes and can be recreated with kubectl by setting an older/smaller value for the /metdata/resourceVersion
field for a running gameserver:
kubectl patch -v=8 gameserver xxx-aaa-bbb --type json -p '[{"op":"test","path":"/metadata/resourceVersion","value":"37909512"},{"op":"replace","path":"/status/state","value":"RequestReady"}]'
To recreate with Agones, then two clients setting a label or annotation on the same gameserver at the same time will generate the error.
Anything else we need to know?:
Environment:
- Agones version: 1.41
- Kubernetes version (use
kubectl version
): v1.29.6-gke.1038001 - Cloud provider or hardware configuration: GKE
- Install method (yaml/helm): helm
- Troubleshooting guide log(s):
- Others:
I know why this is -- the conflict doesn't reigster under IsConflict
. Should be easy enough to fix, I'm just trying to work out how to force a conflict in the SDK Sidecar 🤦🏻 😁 so I can at least see it in action.
Got it! Fix coming shortly.