/dev/kfd no longer exists
PhilipDeegan opened this issue · 11 comments
Not sure where it went.
Is this expected?
This is on your host machine, or in your container?
Host - rocm-dkms became an optional package and was seemingly removed.
Possibly something in the 1.6 to 1.7 upgrade?
Something went bad with the packages. rocm-dkms should not have been removed. It's a meta package that references a bunch of subpackages. I assume that if you do a lsmod
now, you will no longer see the amdgpu or amdkfd kernel modules loaded. The amdkfd module is what provides the /dev/kfd device.
I tried doing a manual modprobe on amdkfd, but I got an exec error
doing a full remove/reinstall now will let you know thanks
That did the trick, can you confirm just for my own curiosity, Is the ROCM 4.11 kernel still supposed to be there? I have it, but I don't see a "rocm-kernel" package anymore so I'm not sure if it's obsolete.
Thanks
Great. Yes, I think the rocm-kernel package is obsolete. The corresponding equivalent package in our new dkms world is rock-dkms. With dkms, you no longer have a custom monolithic roc kernel, you should have the stock Ubuntu kernel and the rocm stuff loads like a kernel driver with dkms. With a stock ubuntu 16.04 install, if you typed uname -r
, it should give you a 4.4 kernel version.
Yeah I still have 4.11.0-kfd-compute-rocm-rel-1.6-180
somehow
So the normal host kernel should work now? I thought would only work with 4.16
That kernel is from the 1.6 rocm (dkms does away with custom kernels). You should be able to revert back to the stock ubuntu kernel with:
sudo dpkg --purge linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-180 linux-image-4.11.0-kfd-compute-rocm-rel-1.6-180
. Sounds like you may have a mixture of 1.6 and 1.7 packages now, but it would be best (i.e. tested) if you double checked that all the 1.6 packages are removed.
congrats to all involved, that was a bit of a pain point
I found the uninstall directions for 1.6 here. Wouldn't hurt to try that. you may unintentionally uninstall a 1.7 package (not sure), but that would be trivial to reinstall with apt install