[Core] Incorrectly detected TPU on a HPU-only node.
Opened this issue · 2 comments
woshiyyya commented
What happened + What you expected to happen
The author of this PR runs a distributed training workload on a 8-HPU node, however, ray detects there's an additional TPU in the cluster. It could be a ray core's device detection bug.
Versions / Dependencies
nightly
Reproduction script
Issue Severity
Low: It annoys or frustrates me.
rynewang commented
@allenwang28 would you mind taking a look?
allenwang28 commented
Thanks for the tag! Does the HPU node have something listed at /dev/vfio
or /dev/accel*
?