TOSCA AutoScaling

Question

TOSCA AutoScaling

cankarm opened this issue 5 years ago · 9 comments

The current version of the autoscaling policy defines the boundaries of scaling (min/max)
but it does not explain much about when the scaling can happen. Where the thresholds will be defined? And how orchestrator is triggered when they are reached?

Answer 1 · 2019-09-06T06:06:21.000Z

I'm wondering if a user really needs to specify the thresholds for auto-scaling. I think this is handled behind the scenes by the "system", either based on fixed values or based on monitoring and a calculated baseline for each service.

Answer 2 · 2019-09-06T07:40:46.000Z

Sure. Who sets this things in monitoring then? This should be done during deploy? I imagine that should be perfectly inline with the:

blueprint and blueprint changes in the future
with monitoring configuration (as where to send the requests for scaling)

Will this be the job for the orchestrator?

So when monitoring sends notification: "Hey, we are outnumbered here, load is 42, please, give us more power". What will you do? Start one another instance or two or five? Who decides if load 42 is just a little bit high or substantially over the top?

or

Monitoring will know what to do with "load 42" and say: "Hi orchestrator, I need exactly 3 instances of XYZ. Please provide them here." This means that the whole scaling logic (not only what are the tresholds) is near monitoring.

For me it is a bit strange if you do not have the logic for scaling near the logic for deployment.

Moreover .... Fixed values? What fixed values?

Answer 3 · 2019-09-12T09:56:49.000Z

I would very much prefer a declarative description of the threshold for scaling.

Scale up when cpu/memory > 80%
Scale down when cpu/memory < 20%

Answer 4 · 2019-09-12T10:06:36.000Z

@naesheim I like this. Probably we should also have a grace period - For example, CPU must be over 80% for 20 seconds before some action is triggered. Or maybe have different grace periods for different cases like:

if over 75% for 20 seconds: trigger
if over 95% for 10 seconds: trigger

Similar for downscaling, of course.

What do you think of that?

Answer 5 · 2019-10-24T08:48:20.000Z

To continue this conversation - this yaml sniped does not hold any relation to the node or type of nodes that autoscale affects.

Answer 6 · 2019-11-06T08:36:54.000Z

Which snippet do you mean? This one? ... That's just the type definition.

One have to instantiate this type in a topology template:

topology_template:
	policies:
		my_node_auto_scaling:
			type: radon.policies.scaling.AutoScale
			...
			targets: [ ... ]

Continuing on the threshold discussion... In that case you would apply the ScaleIn and/or ScaleOut policies.

Auto-scaling in my opinion means that these thresholds cannot be defined by users. These are set automatically by the "system". However, most probably we required more properties in order to affect the auto-scaling. On top of min and max instances, it's probably useful to set the default_instances and the number of instances to increment until max is reached.

Answer 7 · 2019-11-06T08:59:18.000Z

@miwurster yes, I was referring to that exact snippet.
I interpreted auto-scaling as scaling done automatically without user intervention.

If auto-scaling interpreted as you suggested, then we need to have "on/off" switch for it, or min=max instances explains that exact scenario. However, this kind of auto-scaling is very appropriate for FaaS, but then the number of instances is not the right approach for all providers, but only for some (I presume that OpenFaaS would be one).

And in which occasion we need non-auto-scaling or ScaleIn/ScaleOur? Why we would have two different approaches?

Answer 8 · 2019-11-06T12:19:02.000Z

I think we are talking about two different things here. First of all, I don't think that we use such an auto-scaling policy with FaaS ... as FaaS is implicitly auto-scaled and completely managed by the underlying provider/platform (and therefore the scaling settings cannot be influenced by the end-user - at least what I saw so far in practice).
However, the auto-scaling policy from the RADON Particles is rather intended to be used in the data pipeline context. In this context, we want to annotate a data pipeline stack to automatically scale depending on a "certain workload".

Now, when it comes to specify such a "certain workload", there are two directions we can go for IMHO.

On the one hand, we can give the user full flexibility in specifying this. So the end-user defines the "metric" (e.g., CPU or memory load) and a "value" that must not be exceeded or undercut. Further, the user defines the "number" of instance he or she wants to add/remove. Having that, our system (orchestrator + monitoring) could employ the following logic:

Monitor <Metric> every <Time>

If (<Metric> <Operator> <Value>) {
	Add | Remove <Resource> by <Number>
}

On the other hand, if we take a look at AWS Beanstalk and their auto-scaling behavior, the user only has to define the min and max number of instances. Beanstalk uses some reasonable defaults to trigger the scaling. So, it's kind of optional for the end-user to make some special adjustments depending on their needs.

All I'm trying to say or question is how much of configuration do we want to expose to the end-user? It's not that I'm in favor of one or the other option. I would actually tend for the "easier" solution ;-)

Answer 9 · 2019-11-07T09:15:36.000Z

Regarding the FaaS scaling:

in case of Lambda scaling there are some limits of number of concurrent invocations.
in case of OpenFaaS, afaik, everything is done by containers in a specific node (or pool of nodes). So when the node (pool) is exhausted, a new node needs to be added into the pool.

For others I did not yet explored what are the options. But there is a possibility to make some adjustments in configuration of Lambda, and probably very important review also others who can suffer even more (like in case of OpenFaaS).

For the other scaling that you mention I agree and probably we were on the same page from the start. The "reasonable defaults" you mention are good way to go, probably it is just me, who sees this as something to configure and not "defaults". If I understand correctly for Beanstalk the default is only based on network load? But user can complicate much more in details.

What to support in RADON? I think that your approach with logic is good and we should have an eye on Beanstalk definitions, so we will not reinvent the wheel. We can propose some most obvious metrics to cover and ask UCs if this is enough.