spidernet-io/spiderpool

How to combine GPU and PF topology for more reasonable card selection and scheduling

Opened this issue · 1 comments

What would you like to be added?

I hope spiderpool could be support to select a reasonable VF based on the topology of GPU and PF.

Why is this needed?

I think this feature can better take advantage of computing power performance, especially in AI training scenarios.

How to implement it (if possible)?

I might start with topology awareness、scheduler improvements and CNI optimization. It seems that kubelet can only randomly allocate vf, and I do not plan to optimize kubelet, but give the scheduling results to CNI.

Additional context

Hope to build this function together.

Thanks for opening this issue @XiaoDouGeGe, If there is any progress, we will update it here.