Auto extend node subnet

Question

Auto extend node subnet

Opened this issue 10 months ago · 4 comments

Short

The node subnet is composed of 64 addresses. If a worker requires more addresses, we should perform another request to extend the subnetwork.

Proposal

At worker initialization time, the worker requires a net size of 64 as usual. Then, every time the address space is exhausted because we have more than 64 networked containers in a worker, we should extend it with a new request that proposes a second subnet to the worker.

One possible solution would
-> Request a new subnet if a subnet is exhausted inside env.generateAddress()
-> The new addresses can be stored inside env.addrCache

Ratio

Remove container limitations.

Impact

NetManager - Cluster manager(maybe)

Development time

1 week

Status

finding a solution

Checklist

Discussed
Documented
Implemented
Tested

Answer 1 · 2023-12-08T13:25:13.000Z

@smnzlnsk what do you think about it?

Answer 2 · 2023-12-08T14:57:21.000Z

A couple points that popped up:

When would this be needed? When a cluster is completely out of options? Won't this cause a 'super' node if the services deployed are idle, but the scheduler decides to keep deploying on that one node because from a scheduling standpoint it seems fine? That one node would keep requesting address space, or is there an upper limit planned?
Are we planning on making the scheduler respect the available addresses of worker nodes?
Assuming we allow this and have a weak node. If that node keeps getting chosen by the scheduler and keeps deploying services, which suddenly all have a surge in traffic, causing the node to crash, won't this cause even more re-scheduling effort?

I think there are a lot of variables we need to respect, before going ahead with this. In general this seems like a good idea, though, iff the node will be able to withstand higher strain in the future (and the scheduler does not discriminate).

Answer 3 · 2023-12-11T11:03:55.000Z

Ideally, I think that we should not be limited by addressing space but by actual resources. If a node has run out of resources, the scheduler will not (or should not) send new deployments regardless. If, instead, a node is capable of handling new workloads according to the SLA but has run out of addresses, it should request more, I guess.

Answer 4 · 2023-12-11T16:27:29.000Z

@smnzlnsk What do you think of the solution in #156?