intel/intel-cmt-cat

MBA doesn't work on Intel(R) Xeon(R) Gold 6226R CPU

IteratorandIterator opened this issue · 13 comments

  1. I followed the official tutorial to compile and install intel-cmt-cat.
  2. Then, I created a process with a read bandwidth of 10000MB by using sudo ./user/local/bin/membw -c 31 -b 10000 --read.
  3. After that, I used sudo pqos-os -a 'cos:7=31' && sudo pqos-os -e 'mba_max:7=500'.
  4. Finally, I checked the process bandwidth with pidof membw && sudo pqos-os -p mbl:pid.

However, the process bandwidth was not limited at all, the same as when MBA was not used to limit it. Why is this the case?
Below is my basic configuration information:

lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz
Stepping: 7
CPU MHz: 3600.000
CPU max MHz: 3900.0000
CPU min MHz: 1200.0000
BogoMIPS: 5800.00
Virtualization: VT-x
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 16 MiB
L3 cache: 22 MiB
NUMA node0 CPU(s): 0-31

uname -a
Linux HM1 5.15.0-97-generic #107~20.04.1-Ubuntu SMP Fri Feb 9 14:20:11 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Any reply would be appreciated!

Could you please provide me with the output of the following command?
$ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D

You should run it from within your source code root directory.

Also, could you please make the same experiment using MSR interface?
$ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr <your_cmd>

Finally, after running the tests please obtain the following information:
LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -s

Could you please provide me with the output of the following command? $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D

You should run it from within your source code root directory.

Thanks for your reply!

The out put of "[zlx@HM1:~/utils/RDT/intel-cmt-cat] $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D" is:

NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
API lock initialization error!
Error initializing PQoS library!

The out put of "[zlx@HM1:~/utils/RDT/intel-cmt-cat] $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D" is:

NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
Hardware capabilities
Monitoring
Cache Monitoring Technology (CMT) events:
LLC Occupancy (LLC)
I/O RDT: unsupported
scale factor: 65536
max rmid: 128
counter length: 24b
Memory Bandwidth Monitoring (MBM) events:
Total Memory Bandwidth (TMEM)
I/O RDT: unsupported
scale factor: 65536
max rmid: 128
counter length: 24b
Local Memory Bandwidth (LMEM)
I/O RDT: unsupported
scale factor: 65536
max rmid: 128
counter length: 24b
Remote Memory Bandwidth (RMEM) (calculated)
I/O RDT: unsupported
scale factor: 65536
max rmid: 128
counter length: 24b
PMU events:
Instructions/Clock (IPC)
LLC misses
LLC references
LLC misses - pcie read
LLC misses - pcie write
LLC references - pcie read
LLC references - pcie write
Allocation
Cache Allocation Technology (CAT)
L3 CAT
CDP: enabled
Non-Contiguous CBM: unsupported
I/O RDT: unsupported
Num COS: 8
Way size: 2097152 bytes
Ways contention bit-mask: 0x600
Min CBM bits: 1
Max CBM bits: 11
Memory Bandwidth Allocation (MBA)
Num COS: 8
Granularity: 10
Min B/W: 10
Type: linear
MBA 4.0 extensions: unsupported
Cache information
L3 Cache
Num ways: 11
Way size: 2097152 bytes
Num sets: 32768
Line size: 64 bytes
Total size: 23068672 bytes
L2 Cache
Num ways: 16
Way size: 65536 bytes
Num sets: 1024
Line size: 64 bytes
Total size: 1048576 bytes

Finally, after running the tests please obtain the following information:
LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -s

Here are the results:

NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
L3CA/MBA COS definitions for Socket 0:
L3CA COS0 => DATA 0x7ff, CODE 0x7ff
L3CA COS1 => DATA 0x7ff, CODE 0x7ff
L3CA COS2 => DATA 0x7ff, CODE 0x7ff
L3CA COS3 => DATA 0x7ff, CODE 0x7ff
L3CA COS4 => DATA 0x7ff, CODE 0x7ff
L3CA COS5 => DATA 0x7ff, CODE 0x7ff
L3CA COS6 => DATA 0x7ff, CODE 0x7ff
L3CA COS7 => DATA 0x7ff, CODE 0x7ff
MBA COS0 => 20% available
MBA COS1 => 100% available
MBA COS2 => 100% available
MBA COS3 => 100% available
MBA COS4 => 100% available
MBA COS5 => 100% available
MBA COS6 => 100% available
MBA COS7 => 10% available
Core information for socket 0:
Core 0, L2ID 0, L3ID 0 => COS0, RMID84
Core 1, L2ID 1, L3ID 0 => COS0, RMID85
Core 2, L2ID 2, L3ID 0 => COS0, RMID86
Core 3, L2ID 3, L3ID 0 => COS0, RMID87
Core 4, L2ID 4, L3ID 0 => COS0, RMID89
Core 5, L2ID 5, L3ID 0 => COS0, RMID90
Core 6, L2ID 6, L3ID 0 => COS0, RMID60
Core 7, L2ID 7, L3ID 0 => COS0, RMID61
Core 8, L2ID 8, L3ID 0 => COS0, RMID62
Core 9, L2ID 9, L3ID 0 => COS0, RMID63
Core 10, L2ID 10, L3ID 0 => COS0, RMID64
Core 11, L2ID 11, L3ID 0 => COS0, RMID65
Core 12, L2ID 12, L3ID 0 => COS0, RMID66
Core 13, L2ID 13, L3ID 0 => COS0, RMID67
Core 14, L2ID 14, L3ID 0 => COS0, RMID68
Core 15, L2ID 15, L3ID 0 => COS0, RMID69
Core 16, L2ID 0, L3ID 0 => COS0, RMID70
Core 17, L2ID 1, L3ID 0 => COS0, RMID71
Core 18, L2ID 2, L3ID 0 => COS0, RMID72
Core 19, L2ID 3, L3ID 0 => COS0, RMID73
Core 20, L2ID 4, L3ID 0 => COS0, RMID74
Core 21, L2ID 5, L3ID 0 => COS0, RMID75
Core 22, L2ID 6, L3ID 0 => COS0, RMID88
Core 23, L2ID 7, L3ID 0 => COS0, RMID91
Core 24, L2ID 8, L3ID 0 => COS0, RMID108
Core 25, L2ID 9, L3ID 0 => COS0, RMID109
Core 26, L2ID 10, L3ID 0 => COS0, RMID110
Core 27, L2ID 11, L3ID 0 => COS0, RMID112
Core 28, L2ID 12, L3ID 0 => COS0, RMID113
Core 29, L2ID 13, L3ID 0 => COS0, RMID114
Core 30, L2ID 14, L3ID 0 => COS0, RMID115
Core 31, L2ID 15, L3ID 0 => COS7, RMID122

Also, could you please make the same experiment using MSR interface?
$ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr <your_cmd>

Absolutely, I'd be happy to conduct the experiment using the MSR interface as well !

First, I created a process with a read bandwidth of 10000MB by using [zlx@HM1:~/utils/RDT/intel-cmt-cat] $ sudo membw -c 31 -b 10000 --read
The out put of command is "- THREAD logical core id: 31, memory bandwidth [MB]: 10000, starting…"

Then, I used [zlx@HM1:~/utils/RDT/intel-cmt-cat] $ sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -a 'cos:7=31'
The out put of command is "
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
Allocation configuration altered. "

sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -e 'mba:7=10'.
The out put of command is "
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
SOCKET 0 MBA COS7 => 10% requested, 10% applied
Allocation configuration altered. "

Finally, I checked the process bandwidth with sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -m mbl:31
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
ERROR: Monitoring on core 31 is already started
Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background.

ERROR: Monitoring on core 31 is already started
Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Typically this happen when process failed by some reason, but msr/resctrl settings hasn't been cleared.
I would recommend restart the machine, repeat the experiment using only msr interface and provide the results, that is, if the bandwidth is throtthled.

Also please provide me the following info:

  1. Version of the code that you use for building - 'master' branch or a particular tag.
  2. OS version that you use
  3. Kernel version that you use ($ uname -a)

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Typically this happen when process failed by some reason, but msr/resctrl settings hasn't been cleared. I would recommend restart the machine, repeat the experiment using only msr interface and provide the results, that is, if the bandwidth is throtthled.

Thank you! Since the server is being used by multiple people, I will inform you of the test results immediately after I have discussed and agreed on a restart time with them.

Also please provide me the following info:

  1. Version of the code that you use for building - 'master' branch or a particular tag.
  2. OS version that you use
  3. Kernel version that you use ($ uname -a)
  1. The version of the code is 'master' branch
  2. OS version is Ubuntu-20.04 Desktop
  3. Kernel version is 5.15.0-97-generic

OK, thanks for the information.
And let me suggest how to address some issues you encountered:

  • I see the error below in the output:
    API lock initialization error!
    Error initializing PQoS library!
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    That typically can be solved by removing the lockfile at “/var/lock/libpqos”.

  • The warning below appears in a few places too:
    WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    I would recommend to unmount resctrl before running experiment with the MSR interface.

ERROR: Monitoring on core 31 is already started
Monitoring start error on core(s) 31, status 3
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
If the pqos utility was killed while monitoring, you should be able to force start monitoring by doing a monitoring reset with pqos -r
No reboot should be required.

And please provide me with feedback when the results of the experiments with MSR are ready

OK, thanks for the information. And let me suggest how to address some issues you encountered:

  • I see the error below in the output:
    API lock initialization error!
    Error initializing PQoS library!
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    That typically can be solved by removing the lockfile at “/var/lock/libpqos”.
  • The warning below appears in a few places too:
    WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    I would recommend to unmount resctrl before running experiment with the MSR interface.

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< If the pqos utility was killed while monitoring, you should be able to force start monitoring by doing a monitoring reset with pqos -r No reboot should be required.

And please provide me with feedback when the results of the experiments with MSR are ready

Thanks! When I use --iface=msr, I can achieve the effect of limiting bandwidth, but it doesn't work when I use --iface=os. I don't know why this is the case.