OKG failing to correctly add network annotation to Pod when using readinessProbe
Closed this issue · 1 comments
Background:
In my GameServer, I have configured two game processes. Process 1 relies on OKG network annotation and will only start after reading this annotation. Additionally, I have set up a readinessProbe to monitor whether this process's GRPC listen is ready. Process 2 depends on process 1, and will only start after process 1 is ready. This setup utilizes OKG's startup sequence control.
Problem:
In the given background, there is an occasional issue where Pods fail to retrieve network annotation, causing them to remain in a pending state indefinitely.
During my actual usage, when the GameServerSet replicas are set to 4, I encountered a situation where one Pod remains in the pending state while the others start up normally.
This is my GameServerSet yaml:
apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
name: gss
labels:
gs-group: test
spec:
replicas: 4
updateStrategy:
rollingUpdate:
podUpdatePolicy: ReCreate
network:
networkType: Kubernetes-HostPort
networkConf:
- name: ContainerPorts
value: "process1:5000/TCP"
gameServerTemplate:
metadata:
labels:
gs-group: test
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
gs-group: test
imagePullSecrets:
- name: qcloudregistrykey
containers:
- image: IMAGE_1
imagePullPolicy: IfNotPresent
name: process1
env:
- name: KRUISE_CONTAINER_PRIORITY
value: "2"
readinessProbe:
tcpSocket:
port: 6000
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
volumeMounts:
- name: network
mountPath: /opt/network
- image: IMAGE_2
imagePullPolicy: IfNotPresent
name: process2
env:
- name: KRUISE_CONTAINER_PRIORITY
value: "1"
- name: network
downwardAPI:
items:
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations['game.kruise.io/network-status']
Since the network and game server pods are ready asynchronously, OKG provides a trigger mechanism to allow the network plugin to continuously obtain network information and return it to GS. However, the current trigger condition requires the GS to be in the Ready state, which is unreasonable and we will improve it in the next version.