[DOCS] incomplete/incorrect statements regarding marker API
Closed this issue · 2 comments
The wiki (https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#using-the-marker-api) states the following:
For a threaded code it is important to call the following sequence of function calls from the serial part of the program:
LIKWID_MARKER_INIT;
[...]
LIKWID_MARKER_CLOSE;
but if any openmp region is opened before the LIKWID_MARKER_INIT call, then the internal data structures are incorrect (or at least might be, depending on the underlying CPU/node arch), and counters are read incorrectly.
E.g. on A64FX with 4 ranks and 6 threads trying to read EA_L2 results in rank 0 / thread 0 reading the counter (so far so good), but also rank 1 / thread 0+1, rank 2 / thread 0+1, and rank 3 / thread 0+1 are reading the same counter. Thread 1 should not read it, but is due to a incorrectly created internal topology data structure.
The bug with multiple threads reading/reporting counters (marker API only) which they should not access seems to go away when a topology file, generated via likwid-genTopoCfg, is present on the node. I assume the topology parser (when there's no topo file) has some bugs which need to be fixed, or the topo should not be recreated for threads within the marker ROI. Anyhow, if you want to recreate the issue i suggest starting with this command on a a64fx (or other node with multiple numa domains):
mpirun -np 4 -x OMP_NUM_THREADS=6 -x OMP_PROC_BIND=close -x XOS_MMM_L_ARENA_LOCK_TYPE=0 -x XOS_MMM_L_HPAGE_TYPE=hugetlbfs -x XOS_MMM_L_PAGING_POLICY=demand:demand:demand --mca btl ^openib,tcp --oversubscribe --map-by slot:pe=6 --bind-to core:overload-allowed --tag-output --merge-stderr-to-stdout likwid-perfctr --marker -g ENERGY ./stream_f.exe
(see PR #603 for the ENERGY.txt file)
The issue comes from changed CPUsets is both cases. When an application is started through LIKWID, the application initially has a CPUset containing all selected HWthreads. If LIKWID_MARKER_INIT
is called in this case, it "sees" all potential HWthreads taking part in the computation. As soon as a Pthread thread is started (e.g. by OpenMP), LIKWID's pinning library pins the application (the master thread) to the first HWthread and the workers to consecutive HWthreads in the CPUset. If LIKWID_MARKER_INIT
is executed afterwards, it "sees" only its single-core CPUset.
If the topology file is provided, the application as well as all started threads read their topology from the file. This included the CPUset (commonly all threads are allowed because likwid-getTopoCfg
is rarely executed in environments with limited CPUset).