How to combine GPU and PF topology for more reasonable card selection and scheduling

Question

How to combine GPU and PF topology for more reasonable card selection and scheduling

Opened this issue 11 days ago · 1 comments

XiaoDouGeGe commented 11 days ago

What would you like to be added?

I hope spiderpool could be support to select a reasonable VF based on the topology of GPU and PF.

Why is this needed?

I think this feature can better take advantage of computing power performance, especially in AI training scenarios.

How to implement it (if possible)?

I might start with topology awareness、scheduler improvements and CNI optimization. It seems that kubelet can only randomly allocate vf, and I do not plan to optimize kubelet, but give the scheduling results to CNI.

Additional context

Hope to build this function together.

Answer 1 · 2024-10-28T07:17:03.000Z

Thanks for opening this issue @XiaoDouGeGe, If there is any progress, we will update it here.