clbr/radeontop

Reported memory usage exceeds maximum values

Opened this issue · 8 comments

Kernel: 5.9.1-zen1-1-zen
OS: Arch Linux
GPU: Radeon R5 230
MoBo: Asus PRIME-B350-PLUS

07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos PRO [Radeon HD 7450] [1002:677b]
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8099]
	Kernel driver in use: radeon
	Kernel modules: radeon
07:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5 230/235/235X OEM] [1002:a...
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:aa98]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

ss-2020-10-28T17:18:53

clbr commented

Oh my. And no errors like "Failed to get VRAM usage"? Can you build from git, and add printfs around the vram getting in radeon.c?

Yes, there was no "Failed to get VRAM usage" reported when I exited the program.
Not sure whether you meant to print the following since I am not familiar with the code, but I suspect the out64 holds the value (although it has the unused attribute)

diff --git a/radeon.c b/radeon.c
index 3beaf8b..a81f5a8 100644
--- a/radeon.c
+++ b/radeon.c
@@ -122,8 +122,11 @@ void init_radeon(int fd, int drm_major, int drm_minor) {
                vramsize = gem.vram_size;
                gttsize = gem.gart_size;
 
-               if (!(ret = getvram_radeon(&out64)))
+               fprintf(stdout, "%lu\n", out64);
+               if (!(ret = getvram_radeon(&out64))) {
                        getvram = getvram_radeon;
+                       fprintf(stdout, "%lu\n", out64);
+               }
                else
                        drmError(ret, _("Failed to get VRAM usage"));

The values printed out were the following:

139911984272822
1603707379712
clbr commented

Well, looks like the kernel is reporting success while giving wrong values. Not much radeontop can do there. Either a kernel bug or the kernel has changed the ABI, which is against the guarantees, and so again a bug.

clbr commented

If you'd like to report a kernel bug, a git bisect would help, but even a coarse "version 5.x works, 5.x+1 fails" would be enough to start the report.

It seems that the issue is resolved on kernel 5.11.1.
I tested on Arch Linux, it shows true value now , but now, in-use VRAM/GTT is always zero !

Screenshot_2021-02-24_12-45-57

Similar if not exact behaviour as @chromer030 described regarding VRAM and GTT values on my end, also running arch on v5.11.1.
Last time I ran the arch linux-lts kernel radeontop did report values that seemed normal, i.e. not pegged to zero and not exceeding the upper limit.
I do not quite remember whether that was the linux-lts version that is currently available (5.10.17) or was it an older one.

@mi12078
Based on Arch Git, they switched to 5.10 LTS branch on Feb 14, 2021.
Screenshot_2021-02-24_14-25-15

Prior to 5.9.x. series, everything is ok.

I do not quite remember whether that was the linux-lts version that is currently available (5.10.17) or was it an older one.

I think you were on 5.4.x series which was ok with Radeon's value reporting.

debian 11 bullseye 5.10.0-14-amd64 just started doing the same thing today. Last update of any type was 10 days ago.