openkruise/rollouts

[BUG] available replicaset scale down in paused continuous release

TomQunChaoA opened this issue · 0 comments

Scenario

There are 3 version of the application

v1: stable version
v2:bug canary version
v3: hotfix version

initial

kubectl apply -f v1.yaml
kubectl apply -f update1.yaml
kubectl apply -f v2.yaml

result:

❯ k get rollout
NAME            STATUS        CANARY_STEP   CANARY_STATE   MESSAGE                                                        AGE
rollouts-demo   Progressing   1             StepUpgrade    Rollout is in step(1/4), and upgrade workload to new version   30s
❯ k get pods
NAME                             READY   STATUS         RESTARTS   AGE
workload-demo-7489b6d7-t49rv     0/1     ErrImagePull   0          12s
workload-demo-75cdb8b549-497mq   1/1     Running        0          41s
workload-demo-75cdb8b549-5ct2k   1/1     Running        0          41s
workload-demo-75cdb8b549-76c5r   1/1     Running        0          41s
workload-demo-75cdb8b549-kq76b   1/1     Running        0          41s
workload-demo-75cdb8b549-xxtt6   1/1     Running        0          41s

pause rollout

kubectl patch rollout rollouts-demo -p '{"spec":{"strategy":{"paused":true}}}' --type merge

result:

❯ k get rollout
NAME            STATUS        CANARY_STEP   CANARY_STATE   MESSAGE                                                  AGE
rollouts-demo   Progressing   1             StepUpgrade    Rollout has been paused, you can resume it by kube-cli   85s
❯ k get pods
NAME                             READY   STATUS             RESTARTS   AGE
workload-demo-7489b6d7-t49rv     0/1     ImagePullBackOff   0          65s
workload-demo-75cdb8b549-497mq   1/1     Running            0          94s
workload-demo-75cdb8b549-5ct2k   1/1     Running            0          94s
workload-demo-75cdb8b549-76c5r   1/1     Running            0          94s
workload-demo-75cdb8b549-kq76b   1/1     Running            0          94s
workload-demo-75cdb8b549-xxtt6   1/1     Running            0          94s

apply v3

kubectl apply -f v3.yaml

result:

❯ k get pods
NAME                             READY   STATUS         RESTARTS   AGE
workload-demo-7489b6d7-t49rv     0/1     ErrImagePull   0          2m6s
workload-demo-75cdb8b549-5ct2k   1/1     Running        0          2m35s
workload-demo-75cdb8b549-76c5r   1/1     Running        0          2m35s
workload-demo-75cdb8b549-kq76b   1/1     Running        0          2m35s
workload-demo-75cdb8b549-xxtt6   1/1     Running        0          2m35s

The stable version's replicas=4, available replicas=4, it should be 5

Maybe Bug Code

minAvailable := *(deployment.Spec.Replicas) - maxUnavailable

minAvailable := *(deployment.Spec.Replicas) - maxUnavailable
	newRSUnavailablePodCount := *(newRS.Spec.Replicas) - newRS.Status.AvailableReplicas
// this line should also sub oldRS Unavaliable Pod Count?
	maxScaledDown := allPodsCount - minAvailable - newRSUnavailablePodCount

POC

v1.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: workload-demo
  namespace: xm
spec:
  replicas: 5
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: busybox
        image: busybox:latest
        command: ["/bin/sh", "-c", "sleep 100d"]
        env:
        - name: VERSION
          value: "version-1"

v2.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: workload-demo
  namespace: xm
spec:
  replicas: 5
  selector:
    matchLabels:
      app: demo
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: busybox
        image: busyboy:latest
        command: ["/bin/sh", "-c", "sleep 100d"]
        env:
        - name: VERSION
          value: "version-2"

using image:busyboy to make deployment failed

v3.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: workload-demo
  namespace: xm
spec:
  replicas: 5
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: busybox
        image: busybox:latest
        command: ["/bin/sh", "-c", "sleep 100d"]
        env:
        - name: VERSION
          value: "version-3"

update.yaml

apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
  namespace: xm
  annotations:
    rollouts.kruise.io/rolling-style: partition
spec:
  objectRef:
    workloadRef:
      apiVersion: apps/v1
      kind: Deployment
      name: workload-demo
  strategy:
    canary:
      steps:
      - replicas: 1
      - replicas: 3
      - replicas: 4
      - replicas: 5