CachyOS/linux-cachyos

amdgpu crash when opening libreoffice on linux-cachyos

Closed this issue · 15 comments

kxxt commented

Hi, I am hitting an amdgpu crash when opening libreoffice on linux-cachyos 6.7.1-2.

I have encountered two different scenarios so far:

  1. libreoffice hangs and there is a kernel NULL pointer dereference: https://fars.ee/iS7W
  2. libreoffice works, but dmesg shows errors from amdgpu: https://fars.ee/ipU_

On Arch Linux's linux 6.7.1-arch1-1, there is no crash.

Hi, I am hitting an amdgpu crash when opening libreoffice on linux-cachyos 6.7.1-2.

I have encountered two different scenarios so far:

  1. libreoffice hangs and there is a kernel NULL pointer dereference: https://fars.ee/iS7W
  2. libreoffice works, but dmesg shows errors from amdgpu: https://fars.ee/ipU_

On Arch Linux's linux 6.7.1-arch1-1, there is no crash.

Could you try disabling the iGPU?

kxxt commented

Could you try disabling the iGPU?

I am using Ryzen 7950x's integrated GPU. Not a separate one.

Could you try disabling the iGPU?

I am using Ryzen 7950x's integrated GPU. Not a separate one.

Could you try to install the kernel from cachyos/linux-cachyos cachyos/linux-cachyos-headers?

kxxt commented

Could you try disabling the iGPU?

I am using Ryzen 7950x's integrated GPU. Not a separate one.

Could you try to install the kernel from cachyos/linux-cachyos cachyos/linux-cachyos-headers?

Yes. On cachyos/linux-cachyos there's no crash. So maybe some x86_64_v3 optimizations break cachyos-v3/linux-cachyos?

Could you try disabling the iGPU?

I am using Ryzen 7950x's integrated GPU. Not a separate one.

Could you try to install the kernel from cachyos/linux-cachyos cachyos/linux-cachyos-headers?

Yes. On cachyos/linux-cachyos there's no crash. So maybe some x86_64_v3 optimizations break cachyos-v3/linux-cachyos?

Are you not using the x86-64-v4 repository?

My guess would be that the iGPU somehow is not able to work with the avx512 instrucions (even avx512f is disabled at the kernel build, but there are still other instructions applied).

To me this seems highly like a AMD (iGPU) issue. I might forward it to amd, but since it is probably phsyically they wouldnt be able to fix that.

I had the same issue on my 7950X3D and disabling the iGPU fixes it.

kxxt commented

Could you try disabling the iGPU?

I am using Ryzen 7950x's integrated GPU. Not a separate one.

Could you try to install the kernel from cachyos/linux-cachyos cachyos/linux-cachyos-headers?

Yes. On cachyos/linux-cachyos there's no crash. So maybe some x86_64_v3 optimizations break cachyos-v3/linux-cachyos?

Are you not using the x86-64-v4 repository?

My guess would be that the iGPU somehow is not able to work with the avx512 instrucions (even avx512f is disabled at the kernel build, but there are still other instructions applied).

To me this seems highly like a AMD (iGPU) issue. I might forward it to amd, but since it is probably phsyically they wouldnt be able to fix that.

I had the same issue on my 7950X3D and disabling the iGPU fixes it.

Thanks for your help. I don't have a separate GPU right now. I will also try the x86_64_v4 kernel.

Oh, so it also affects the v3 kernel.
As written above, I think there is some kind of wrong behaviour of instructions handeling.

@superm1 Could you maybe help there?
Context:

  1. Kernel compiled with x86-64-v3 or x86-64-v4 instructions. In the arch/x86/Makefile avx, avx2 and avx512 are DISABLED.
  2. When using libreoffice (and it appears only libreoffice so far), we get a crash (I had a complete lockup all time, when the iGPU is enabled), even if it should use my dedicated GPU (Happened on 4070 Super and 1070 Ti)
  3. "Downgrading" to the kernel, which does compiled with the default generic target, it works without problems.

Thanks for your help. If you want to have an extra bugzilla issue, let me know

kxxt commented

Oh, so it also affects the v3 kernel. As written above, I think there is some kind of wrong behaviour of instructions handeling.

❯ uname -a
Linux tuf 6.7.1-2-cachyos #1 SMP PREEMPT_DYNAMIC Sat, 20 Jan 2024 18:08:54 +0000 x86_64 GNU/Linux

~
❯ pacman -Qi linux-cachyos
Installed From  : cachyos-v4
Name            : linux-cachyos

x86_64_v4 kernel works fine for me.

kxxt commented

It's odd. Now I can't even reproduce the bug on x86_64_v3 kernel.

kxxt commented

It's odd. Now I can't even reproduce the bug on x86_64_v3 kernel.

Well. Looks like it only happens on freshly installed libreoffice. After manually removing ~/.config/libreoffice/ I could reproduce the bug again. I will retest on the v4 kernel and ordinary kernel.

kxxt commented

I can reproduce this bug with cachyos/linux-cachyos after clearing ~/.config/libreoffice/. So it's unlikely a x86_64_{v3,v4} related bug now.

kxxt commented

It also reproduces on linux and linux-lts after clearing ~/.config/libreoffice/. So it's not a cachyos specific issue. I am closing this issue. Sorry for the disturbance.

It also reproduces on linux and linux-lts after clearing ~/.config/libreoffice/. So it's not a cachyos specific issue. I am closing this issue. Sorry for the disturbance.

Would you maybe create a bugreport at libreoffice?

@superm1 Could you maybe help there?

I can't access the links posted to this thread. But I'll generically say "If there is a NULL pointer error in amdgpu then there is probably a bug in either mesa or the kernel and it should be reported to the AMD DRM bug tracker with a set of reproduce steps".

Kernel compiled with x86-64-v3 or x86-64-v4 instructions. In the arch/x86/Makefile avx, avx2 and avx512 are DISABLED.

That being said if you're seeing a crash specifically with x86_64_v3/x86_64_v4, this could be a CPU bug too. If that's the case you should raise a kernel Bugzilla for it. But we really need good reproducer steps for it. A simple C application and a matching kconfig would be preferable.