matszpk/amdcovc

Weird ADL metrics for GPU

Brightside56 opened this issue · 3 comments

Adapter 0: Malta [Radeon HD 7990]
  Core: 300 MHz, Mem: 150 MHz, Vddc: 0.85 V, Load: 0%, Temp: 37 C, Fan: 20%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V
Adapter 1: Malta [Radeon HD 7990]
  Core: 501 MHz, Mem: 1500 MHz, Vddc: 0.95 V, Load: 0%, Temp: 47 C, Fan: 20%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V
Adapter 2: Malta [Radeon HD 7990]
  Core: 501 MHz, Mem: 1500 MHz, Vddc: 0.95 V, Load: 0%, Temp: 55 C, Fan: 22%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V
Adapter 3: Malta [Radeon HD 7990]
  Core: 501 MHz, Mem: 1500 MHz, Vddc: 0.95 V, Load: 0%, Temp: 54 C, Fan: 21%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V
Adapter 4: Malta [Radeon HD 7990]
  Core: 501 MHz, Mem: 1500 MHz, Vddc: 0.95 V, Load: 0%, Temp: 49 C, Fan: 20%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V
Adapter 5: Malta [Radeon HD 7990]
  Core: 0 MHz, Mem: 0 MHz, Vddc: 0 V, Load: 200%, Temp: 511 C, Fan: 100%
  Max Ranges: Core: 150 - 1100 MHz, Mem: 75 - 1575 MHz, Vddc: 0.85 - 1.2 V
  PerfLevels: Core: 300 - 1000 MHz, Mem: 150 - 1500 MHz, Vddc: 0.85 - 1.2 V

Please, take a look on Adapter 5. My system is Ubuntu 14.04 with latest fglrx drivers and ADL_SDK. I compiled amdcovc from sources (latest master), but tried release too. Sometimes it shows fantastic temp values for Adapter 5 (511 C) same as utilization value (sometimes over 40k %). Please, help!

I have seen 511 C before, it is a typical lm-sensors value when my gpu is hung.

Look at the Core, Mem and Vddc values there: they are all 0. That also means the gpu is hung.

@valeriob01 , yes, you're right, it's overheating and hangs sometimes