openkruise/kruise-game

OKG failing to correctly add network annotation to Pod when using readinessProbe

Closed this issue · 1 comments

Background:

In my GameServer, I have configured two game processes. Process 1 relies on OKG network annotation and will only start after reading this annotation. Additionally, I have set up a readinessProbe to monitor whether this process's GRPC listen is ready. Process 2 depends on process 1, and will only start after process 1 is ready. This setup utilizes OKG's startup sequence control.

Problem:

In the given background, there is an occasional issue where Pods fail to retrieve network annotation, causing them to remain in a pending state indefinitely.
During my actual usage, when the GameServerSet replicas are set to 4, I encountered a situation where one Pod remains in the pending state while the others start up normally.

This is my GameServerSet yaml:

apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
  name: gss
  labels:
    gs-group: test
spec:
  replicas: 4
  updateStrategy:
    rollingUpdate:
      podUpdatePolicy: ReCreate
  network:
    networkType: Kubernetes-HostPort
    networkConf:
    - name: ContainerPorts 
      value: "process1:5000/TCP"
  gameServerTemplate:
    metadata:
      labels:
        gs-group: test
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  gs-group: test
      imagePullSecrets:
        - name: qcloudregistrykey
      containers:
        - image: IMAGE_1
          imagePullPolicy: IfNotPresent
          name: process1
          env:
          - name: KRUISE_CONTAINER_PRIORITY
            value: "2"
          readinessProbe:
            tcpSocket:
              port: 6000
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          volumeMounts:
            - name: network
              mountPath: /opt/network
        - image: IMAGE_2
          imagePullPolicy: IfNotPresent
          name: process2
          env:
          - name: KRUISE_CONTAINER_PRIORITY
            value: "1"
      - name: network
        downwardAPI:
          items:
          - path: "annotations"
            fieldRef:
              fieldPath: metadata.annotations['game.kruise.io/network-status']

Since the network and game server pods are ready asynchronously, OKG provides a trigger mechanism to allow the network plugin to continuously obtain network information and return it to GS. However, the current trigger condition requires the GS to be in the Ready state, which is unreasonable and we will improve it in the next version.