oakestra/oakestra-net

Auto extend node subnet

Opened this issue · 4 comments

Short

The node subnet is composed of 64 addresses. If a worker requires more addresses, we should perform another request to extend the subnetwork.

Proposal

At worker initialization time, the worker requires a net size of 64 as usual. Then, every time the address space is exhausted because we have more than 64 networked containers in a worker, we should extend it with a new request that proposes a second subnet to the worker.

One possible solution would
-> Request a new subnet if a subnet is exhausted inside env.generateAddress()
-> The new addresses can be stored inside env.addrCache

Ratio

Remove container limitations.

Impact

NetManager - Cluster manager(maybe)

Development time

1 week

Status

finding a solution

Checklist

  • Discussed
  • Documented
  • Implemented
  • Tested

@smnzlnsk what do you think about it?

A couple points that popped up:

  • When would this be needed? When a cluster is completely out of options? Won't this cause a 'super' node if the services deployed are idle, but the scheduler decides to keep deploying on that one node because from a scheduling standpoint it seems fine? That one node would keep requesting address space, or is there an upper limit planned?
  • Are we planning on making the scheduler respect the available addresses of worker nodes?
  • Assuming we allow this and have a weak node. If that node keeps getting chosen by the scheduler and keeps deploying services, which suddenly all have a surge in traffic, causing the node to crash, won't this cause even more re-scheduling effort?

I think there are a lot of variables we need to respect, before going ahead with this. In general this seems like a good idea, though, iff the node will be able to withstand higher strain in the future (and the scheduler does not discriminate).

Ideally, I think that we should not be limited by addressing space but by actual resources. If a node has run out of resources, the scheduler will not (or should not) send new deployments regardless. If, instead, a node is capable of handling new workloads according to the SLA but has run out of addresses, it should request more, I guess.

@smnzlnsk What do you think of the solution in #156?