fluxcd/flagger

Possibility of multiple Intervals

shivamnarula opened this issue · 7 comments

Describe the feature

Support for specifying intervals according to stepWeights.

What problem are you trying to solve?
Suppose we specify stepWeights with [10,20,50] and interval as 5m. Shifting traffic from 50% -> 100% in 5 mins seems like a big deal and also we don't want to delay promotion by introducing multiple steps.

Proposed solution

What do you want to happen? Add any considered drawbacks.
A way to specify Intervals in a list which should be equal to number of stepWeights given.
No, no drawbacks considered yet.

Any alternatives you've considered?

Is there another way to solve this problem that isn't as good a solution?
No

Do you want to specify intervals for each stepweights in test phase, or do you want to specify step weights and intervals for the 50%->100% rollout phase?

I wish to specify intervals for each stepweights in test phase.

Shifting traffic from 50% -> 100% in 5 mins seems like a big deal and also we don't want to delay promotion by introducing multiple steps.

How are you introducing delay, if you add [10,20,50,70,100]? There is no difference from being able to say 50 -> 100 in 10 minutes.

Can you give practical scenarios for using this feature?

We have a few services having high qps, and going from 5% to 10% in let's say 5mins won't cause an issue in downstream services performance, where as in the same 5mins duration going from 50% to 75% or 80% could cause issue, if bad code is pushed.
Also, we don't want to add lots of stepweights which could delay promotion for a good amount of period.

What do you guys think about this?

Hi, not the OP but I have a use case for this feature being requested. We want to have longer duration of canary while being on low percentage, but shorter one when the percentage is higher. This allows us to test basic functionalities with low traffic (lower rate of error if it happens), but still allow us to do some load test afterwards with higher traffic.

For example, if we have [1,2,4,8,16,32], and we are tolerating about 2% error rate, we want the first two steps to have much higher duration (e.g. 10m), but the later percentages to lower (e.g. 1m). The reason being, having lower traffic means lower error rate, but also lower traffic rate so we might not collect enough traffic for confidence.

Right now the way we're handling this is by using [1,2,3,4,5,6,7,8...] instead with 1m interval.