[BUG] available replicaset scale down in paused continuous release
TomQunChaoA opened this issue · 0 comments
TomQunChaoA commented
Scenario
There are 3 version of the application
v1: stable version
v2:bug canary version
v3: hotfix version
initial
kubectl apply -f v1.yaml
kubectl apply -f update1.yaml
kubectl apply -f v2.yaml
result:
❯ k get rollout
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE
rollouts-demo Progressing 1 StepUpgrade Rollout is in step(1/4), and upgrade workload to new version 30s
❯ k get pods
NAME READY STATUS RESTARTS AGE
workload-demo-7489b6d7-t49rv 0/1 ErrImagePull 0 12s
workload-demo-75cdb8b549-497mq 1/1 Running 0 41s
workload-demo-75cdb8b549-5ct2k 1/1 Running 0 41s
workload-demo-75cdb8b549-76c5r 1/1 Running 0 41s
workload-demo-75cdb8b549-kq76b 1/1 Running 0 41s
workload-demo-75cdb8b549-xxtt6 1/1 Running 0 41s
pause rollout
kubectl patch rollout rollouts-demo -p '{"spec":{"strategy":{"paused":true}}}' --type merge
result:
❯ k get rollout
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE
rollouts-demo Progressing 1 StepUpgrade Rollout has been paused, you can resume it by kube-cli 85s
❯ k get pods
NAME READY STATUS RESTARTS AGE
workload-demo-7489b6d7-t49rv 0/1 ImagePullBackOff 0 65s
workload-demo-75cdb8b549-497mq 1/1 Running 0 94s
workload-demo-75cdb8b549-5ct2k 1/1 Running 0 94s
workload-demo-75cdb8b549-76c5r 1/1 Running 0 94s
workload-demo-75cdb8b549-kq76b 1/1 Running 0 94s
workload-demo-75cdb8b549-xxtt6 1/1 Running 0 94s
apply v3
kubectl apply -f v3.yaml
result:
❯ k get pods
NAME READY STATUS RESTARTS AGE
workload-demo-7489b6d7-t49rv 0/1 ErrImagePull 0 2m6s
workload-demo-75cdb8b549-5ct2k 1/1 Running 0 2m35s
workload-demo-75cdb8b549-76c5r 1/1 Running 0 2m35s
workload-demo-75cdb8b549-kq76b 1/1 Running 0 2m35s
workload-demo-75cdb8b549-xxtt6 1/1 Running 0 2m35s
The stable version's replicas=4, available replicas=4, it should be 5
Maybe Bug Code
rollouts/pkg/controller/deployment/rolling.go
Line 131 in 5626a7f
minAvailable := *(deployment.Spec.Replicas) - maxUnavailable
newRSUnavailablePodCount := *(newRS.Spec.Replicas) - newRS.Status.AvailableReplicas
// this line should also sub oldRS Unavaliable Pod Count?
maxScaledDown := allPodsCount - minAvailable - newRSUnavailablePodCount
POC
v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: workload-demo
namespace: xm
spec:
replicas: 5
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: busybox
image: busybox:latest
command: ["/bin/sh", "-c", "sleep 100d"]
env:
- name: VERSION
value: "version-1"
v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: workload-demo
namespace: xm
spec:
replicas: 5
selector:
matchLabels:
app: demo
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: demo
spec:
containers:
- name: busybox
image: busyboy:latest
command: ["/bin/sh", "-c", "sleep 100d"]
env:
- name: VERSION
value: "version-2"
using image:busyboy to make deployment failed
v3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: workload-demo
namespace: xm
spec:
replicas: 5
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: busybox
image: busybox:latest
command: ["/bin/sh", "-c", "sleep 100d"]
env:
- name: VERSION
value: "version-3"
update.yaml
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
namespace: xm
annotations:
rollouts.kruise.io/rolling-style: partition
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: workload-demo
strategy:
canary:
steps:
- replicas: 1
- replicas: 3
- replicas: 4
- replicas: 5