IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT

If activate max power saving(aspm L1), the GPU can lack of power sometimes and generate BSOD.

Opened this issue ยท 17 comments

Checklist [README]

  • Device is not undervolted nor overclocked
  • Device is using the latest drivers
  • Application is not cracked, modded and use the latest patch

Application [Required]

Any

Processor / Processor Number [Required]

AMD Ryzen5 7500F

Graphic Card [Required]

Intel A770LE 16GB

GPU Driver Version [Required]

31.0.101.5382

Other GPU Driver version

No response

Rendering API [Required]

  • Vulkan
  • OpenGL
  • DirectX12
  • DirectX11
  • DirectX10
  • DirectX9
  • Not applicable

Windows Build Number [Required]

  • Windows 11 23H2
  • Windows 11 22H2
  • Windows 11 21H2
  • Windows 10 22H2
  • Windows 10 21H2
  • Other (Please specify)

Other Windows build number

No response

Intel System Support Utility report

#703

Description and steps to reproduce [Required]

Finally I found the cause of the BSOD(#703).
upgraded gpu drivers(5333/5382), replaced PSU(Seasonic FOCUS GX 750), didn't work. still randomly BSOD when powered on 7x24.

I found the same cases from other users. someone suggested to turn off the maximum power saving, I tried it, never BSOD again.

ref: how to activate max power saving(aspm L1)
reddit #1 reddit #2 reddit #3 reddit #4

notes: increase the TDR only delays the time when bsod occurs, it doesn't prevent bsod.

Device / Platform

MAG B650M MORTAR WIFI

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

Same here, with 7800X3D on ASUS TUF GAMING B650-PLUS WIFI motherboard + Arc A770 LE,
It even bricked Windows 10 in my case, so I've had to remove the GPU, boot with GFX from the APU and repair the OS so it would boot again ... I sure ain't going to try this one again until Intel acknowledges an official fix further down the line ^^"

To be clear, ASPM was enabled in the BIOS, and the system BSOD'd instantly when enabling PCIe maximum power savings in windows, and would boot loop over and over again.

@el-psy-k thanks for reporting it. I will check on my end, meanwhile can you share BSDO dump if you have collected with the latest driver.

@Vivek-Intel Just turned on max power saving now, will post here as soon as I collect the BSOD dump.

I also have this issue. There's even my comment under one of the links shown in the main post. But I get another BSOD message each time when I enable ASPM - CRITICAL_PROCESS_DIED and I don't know whether it can be analyzed under the same GitHub issue.

@Vivek-Intel I'm also willing to provide more details about the BSOD I've encountered, but I need instructions on exactly what is needed and how to collect this information. Thank you.

@Vivek-Intel Here's the newest bsod dump with graphics driver 31.0.101.5445.
Memory passes the memtest 48+ hour tests with 0 errors.
043024-16062-01.dmp

@dieselistus generate SSU report obtain crash dumps
You can open a new issue. if intel engineers can replicate and fix it, all users will benefit.

HI @el-psy-k I have been trying to simulate this issue in our lab, I did not see this issue on my AMD+a770 setup with above said setting and tried playing multiple games. I would do more trials to run different screens and benchmark to see the issue.

@Vivek-Intel @el-psy-k I suppose the issue is related to certain version of BIOS(AGESA).

Alright, I was feeling a little frisky today, so I had another shot.

After flashing the latest bios for my mobo (which comes with AGESA 1.1.7.0 patch A) : no bsod this time, but a black screen with some white-line artifacts.
For comparison, booting into linux with the same settings produces a black screen with horizontal white lines everywhere. Booting into Linux without ASPM works just fine.

My previous attempts were on Windows 10, this was on Windows 11.
This time, I had ASPM enabled in bios (for L1 only), and power savings enabled in windows via power saving power profile, but monitor refresh rate still at 144hz. So far no crash, but GPU still using 40W idle.
As soon as I dipped the refresh rate to 60hz (I guess ASPM kicked in at that precise moment) ... and I had a forever black screen >_<"

The odd-ball thing is, after flashing the bios using flashback (not ezflash) and entering the bios config, the same artifacts were there layed all over the ui, making the whole thing unusable. I chalk this up to the bios having ASPM enabled by default (I checked this). So it's definitely not an OS issue, and not a driver issue either. It has to be strictly firmware related, based on what I've seen today, and I'm still suspecting the GPU's VBIOS, not necessarily the mobo's BIOS.

@Vivek-Intel : can you take this new information into account when testing on your side ?
I'll try cross-testing with my RX 5700XT later on and will keep you updated.

@pcslide Tried every versions of bios, doesn't help.

@freak2fast4u You can post BSOD dump here, caused by igdkmdnd64.sys?

Hi @el-psy-k I have kept my AMD host +A770 system under test for weekend with multiple things running, maximum power saving on, ASPM L1/L0 enabled. I will let you know if I see the issue at my end. I will ask my team to try it out on other host as we do not have exact same motherboard model as yours.

I am referring to SSU you shared in old case but I hope you are using latest driver, latest BIOS. can you share VBIOS version of GPU?

@freak2fast4u Thank you for testing, blank screen after changing refresh rate may or may not be the same as this issue or ASPM specific. I would suggest please try another monitor if possible and create a new thread so that we can isolate it better. I did try the using 144Hz monitor and scaling down the refresh rate but I could not see the issue that you faced.

image
image

@Vivek-Intel
IFWI(V-BIOS): 20.0.1068
Graphics Driver Version: 31.0.101.5445
MotherBoard Bios Version: 7D76vAB(AGESA 1.1.0.2b)

Tried 7D76vAC (AGESA 1.1.0.2b Patch A), if turn on the ASPM L1, random BSOD as usual. Also 7D76vAC has a bug with high idle cpu usage, so I rolled back to prev version.

image
image
Please use HWiNFO to check ASPM status, make sure it's L1 Entry.

BSOD usually occur when the GPU load changes frequently, And turning the monitor off and on multiple times.
Suggestion: Use the --enable-features=IntelVpSuperResolution command line to launch Chrome to play long videos at a lower resolution than your monitor, so that the GPU load is constantly changing.
Then use an automation tool such as AutoHotKey to turn the monitor off and on at intervals.
This might help you to reproduce the issue.

@Vivek-Intel BSOD again, the system was idle and the monitor was in sleep mode when this happened.
050624-16046-01.dmp

Thanks @el-psy-k . I am checking with developers but please know that this issue is inconsistent and I was able to reproduce only once while using system continually for past week so it might take time if developers need live debug or more information on this issue to root cause it.

I have opened a issue with engineering team bug id - 15016023487 for your reference.
We can not commit any time for progress on this issue but will keep you all updated if there is any news on this one.