milvus-io/milvus

[Enhancement]: Resource Group Declarative API

Closed this issue · 1 comments

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

  • Declarative Resource Group API

Why is this needed?

Milvus currently has the functionality of Resource Groups for query node-level resource isolation, as detailed in the design document:

We aim to utilize this capability to achieve hotspot isolation and resource management within the cluster. The main enhancement include:

  • Declarative Resource Group node management API.
    • The current instruction for Resource Group is transfer_node, which does not fully satisfy idempotence and the potential for more complex Resource Group controls in the future (such as at the Database level or Collection level, allowing nodes to freely migrate between different RGs).
    • Refer to systems like Kubernetes and provide a declarative API to modify Resource Group configurations. Modify the current capacity logic to utilize requests + limits.
    • Attempt to remove the special nature of DefaultRG (retain it as non-deletable, created by default) so that new nodes can directly join other RGs without entering DefaultRG.
    • Deprecate the original TransferNode interface. With the completion of the declarative API, only three logics will remain to implement node RG changes: NodeHangUp/NodeDown/AutoRecover.
    • Decompose the functionality of assigning nodes to replicas into Observer, no longer dispersing operations across various APIs. All node changes will be handled by ReplicaObserver.
  • Replica Manager Enhancement
    • Allow TransferNode between two RGs that have loaded the same collection.
    • Allow Replica transfered between two RGs that have loaded the same collection.

Anything else?

No response

all milvus related PR is merged into master branch.