--layout reports incorrect memory layout for EPYC 7xx2 CPU
jhoblitt opened this issue · 3 comments
jhoblitt commented
[jhoblitt@pillan06 rasdaemon]$ sudo ras-mc-ctl --layout
+-----------------------------------------------------------------------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 | csrow3 | csrow4 | csrow5 | csrow6 | csrow7 |
----------+-----------------------------------------------------------------------------------------------+
channel7: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
channel6: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
----------+-----------------------------------------------------------------------------------------------+
channel5: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
channel4: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
----------+-----------------------------------------------------------------------------------------------+
channel3: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
channel2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
----------+-------------------------------------------------------------------------------------------------+
channel1: | 0 MB | 0 MB | 32767 MB | 32767 MB | 0 MB | 0 MB | 0 MB | 0 MB |
channel0: | 0 MB | 0 MB | 32767 MB | 32767 MB | 0 MB | 0 MB | 0 MB | 0 MB |
----------+-------------------------------------------------------------------------------------------------+
[jhoblitt@pillan06 rasdaemon]$ free -g
total used free shared buff/cache available
Mem: 251 93 42 0 115 156
Swap: 0 0 0
[jhoblitt@pillan06 rasdaemon]$ sudo dmidecode --type 4
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Handle 0x0029, DMI type 4, 48 bytes
Processor Information
Socket Designation: CPU
Type: Central Processor
Family: Zen
Manufacturer: Advanced Micro Devices, Inc.
ID: 10 0F 83 00 FF FB 8B 17
Signature: Family 23, Model 49, Stepping 0
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
HTT (Multi-threading)
Version: AMD EPYC 7502P 32-Core Processor
Voltage: 1.1 V
External Clock: 100 MHz
Max Speed: 3350 MHz
Current Speed: 2500 MHz
Status: Populated, Enabled
Upgrade: Socket SP3
L1 Cache Handle: 0x0026
L2 Cache Handle: 0x0027
L3 Cache Handle: 0x0028
Serial Number: Unknown
Asset Tag: Unknown
Part Number: Unknown
Core Count: 32
Core Enabled: 32
Thread Count: 64
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Power/Performance Control
[jhoblitt@pillan06 rasdaemon]$ sudo dmidecode --type 17
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Handle 0x002B, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x002A
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMA1
Bank Locator: P0_Node0_Channel0_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x002D, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x002C
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMA2
Bank Locator: P0_Node0_Channel0_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE3B4
Asset Tag: DIMMA2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x0030, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x002F
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x0032, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0031
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMB2
Bank Locator: P0_Node0_Channel1_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE54C
Asset Tag: DIMMB2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x0035, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0034
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMC1
Bank Locator: P0_Node0_Channel2_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x0037, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0036
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMC2
Bank Locator: P0_Node0_Channel2_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE495
Asset Tag: DIMMC2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x003A, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0039
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMD1
Bank Locator: P0_Node0_Channel3_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x003C, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x003B
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMD2
Bank Locator: P0_Node0_Channel3_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE716
Asset Tag: DIMMD2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x003F, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x003E
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMME1
Bank Locator: P0_Node0_Channel4_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x0041, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0040
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMME2
Bank Locator: P0_Node0_Channel4_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE698
Asset Tag: DIMME2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x0044, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0043
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMF1
Bank Locator: P0_Node0_Channel5_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x0046, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0045
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMF2
Bank Locator: P0_Node0_Channel5_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948EFE3B8
Asset Tag: DIMMF2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x0049, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x0048
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMG1
Bank Locator: P0_Node0_Channel6_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x004B, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x004A
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMG2
Bank Locator: P0_Node0_Channel6_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948F02273
Asset Tag: DIMMG2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x004E, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x004D
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMMH1
Bank Locator: P0_Node0_Channel7_Dimm0
Type: Unknown
Type Detail: Unknown
Handle 0x0050, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x0023
Error Information Handle: 0x004F
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMMH2
Bank Locator: P0_Node0_Channel7_Dimm1
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: T0FN00014948F02271
Asset Tag: DIMMH2_AssetTag (date:21/49)
Part Number: M393A4K40EB3-CWE
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: M393A4K40EB3-CWE
Module Manufacturer ID: Bank 1, Hex 0xCE
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
jhoblitt commented
The layout as displayed by sysfs is wrong too, so this may be more a kernel issue than anything else.
mchehab commented
First of all, it is not every time that the BIOS information is correct. It is actually common that the same BIOS is used on different machines that have different motherboard silk screen and/or different numbers of sockets. So, neither the Kernel nor rasdaemon relies on it.
If the BIOS is reliable enough, one could use:
ras-mc-ctl --guess-labels
memory stick 'ChannelA-DIMM0' is located at 'BANK 0'
memory stick 'ChannelB-DIMM0' is located at 'BANK 2'
With such information (that comes from DMI decoding), it can update the layout that are inside the labels/
directory.
jhoblitt commented
--guess-labels
looks promising but it seems to output to stdout only. Is there a way to machine generate the label db?
[root@pillan06 ~]# ras-mc-ctl --guess-labels
memory stick 'DIMMA1' is located at 'P0_Node0_Channel0_Dimm0'
memory stick 'DIMMA2' is located at 'P0_Node0_Channel0_Dimm1'
memory stick 'DIMMB1' is located at 'P0_Node0_Channel1_Dimm0'
memory stick 'DIMMB2' is located at 'P0_Node0_Channel1_Dimm1'
memory stick 'DIMMC1' is located at 'P0_Node0_Channel2_Dimm0'
memory stick 'DIMMC2' is located at 'P0_Node0_Channel2_Dimm1'
memory stick 'DIMMD1' is located at 'P0_Node0_Channel3_Dimm0'
memory stick 'DIMMD2' is located at 'P0_Node0_Channel3_Dimm1'
memory stick 'DIMME1' is located at 'P0_Node0_Channel4_Dimm0'
memory stick 'DIMME2' is located at 'P0_Node0_Channel4_Dimm1'
memory stick 'DIMMF1' is located at 'P0_Node0_Channel5_Dimm0'
memory stick 'DIMMF2' is located at 'P0_Node0_Channel5_Dimm1'
memory stick 'DIMMG1' is located at 'P0_Node0_Channel6_Dimm0'
memory stick 'DIMMG2' is located at 'P0_Node0_Channel6_Dimm1'
memory stick 'DIMMH1' is located at 'P0_Node0_Channel7_Dimm0'
memory stick 'DIMMH2' is located at 'P0_Node0_Channel7_Dimm1'