LPD-EPFL/mctop

Assertion `gs->type == CORE' failed

proywm opened this issue · 15 comments

mctop run is getting aborted. lstopo and cpuinfo output has been attached.


MCTOP Settings:

Machine name : server

Output : MCT description file

Repetitions : 2000

Do-memory : Latency+Bandwidth on topology

Mem. size bw : 512 MB

Cluster-offset : 20

Max std dev : 7

# Cores : 64

# Sockets : 4

# Hint : 0 clusters

CPU DVFS : 1 (Freq. up in 160 ms)

Progress : 100.0% completed in 49.3 secs (step took 0.2 secs)

CDF Clusters

#0 : size 3 / range 0 - 20 / median: 16

#1 : size 16 / range 72 - 108 / median: 94

#2 : size 162 / range 456 - 972 / median: 690

##############################################################

CPU is SMT: 1

Lat table

MCTOP output in: ./desc/server.mct

##########################################################################
mctop: src/mctop_topology.c:851: mctop_fix_n_hwcs_per_core_smt: Assertion `gs->type == CORE' failed.
Aborted


OS configuration:
Linux server 3.10.0-229.4.2.el7.x86_64 #1 SMP

cpuinfo.txt
lstopo.txt

Hello,

Thank you for reporting the issue. I just pushed a commit that should fix the problem.
In brief, the latencies of hardware contexts were incorrectly clustered together with latency 0, thus not properly creating the separate group between a single context and the two hw contexts that are together in a core.

Please let me know if now mctop works properly on this machine.

Vasilis.

By the way, there is huge variance across sockets:
#2 : size 162 / range 456 - 972 / median: 690
and even the minimum 1-hop latency is 456 cycles, which is much higher than what I have ever seen.

I would be very interesting to see the output of mctop (if successful and you would like to share) :-)

Thanks for quick reply. Now its getting Segmentation fault. Sometimes it is causing "Floating point exception".


MCTOP Settings:

Machine name : server

Output : MCT description file

Repetitions : 2000

Do-memory : Latency+Bandwidth on topology

Mem. size bw : 512 MB

Cluster-offset : 20

Max std dev : 7

# Cores : 64

# Sockets : 4

# Hint : 0 clusters

CPU DVFS : 1 (Freq. up in 160 ms)

Progress : 100.0% completed in 42.4 secs (step took 0.2 secs)

CDF Clusters

#0 : size 1 / range 0 - 0 / median: 0

#1 : size 1 / range 6 - 6 / median: 6

#2 : size 17 / range 74 - 118 / median: 98

#3 : size 174 / range 450 - 1010 / median: 706

##############################################################

CPU is SMT: 1

Lat table

MCTOP output in: ./desc/server.mct

##########################################################################

Calculating cache latencies / sizes

Looking for L1 cache size (16 to 160 KB, step 16 KB)

CDF Clusters

#0 : size 1 / range 0 - 6 / median: 6

##############################################################
Segmentation fault


MCTOP Settings:

Machine name : server

Output : MCT description file

Repetitions : 2000

Do-memory : Latency+Bandwidth on topology

Mem. size bw : 512 MB

Cluster-offset : 20

Max std dev : 7

# Cores : 64

# Sockets : 4

# Hint : 0 clusters

CPU DVFS : 1 (Freq. up in 160 ms)

Progress : 35.9% completed in 22.7 secs (step took 0.6 secs)Floating point exception

From what I see, it crashed on "Calculating cache latencies / sizes," which is not an essential part of the topology creation. I disabled this step for now (through a commit) and I will try to investigate when I have the time.

I hope it will work this time :-)

Thanks for your help.

Unfortunately its still getting segfault. I have attached the generated mct file.

CPU is SMT: 1

Lat table

MCTOP output in: ./desc/server.mct

##########################################################################
Segmentation fault

Thanks,
Probir

server.mct.txt

I really don't like the numbers: They shouldn't be that high in a four socket processor.
(If you replace 908 with 600 in the mct file, loading the topology works.)

Can you send me the output of: ./mctop -m0 -r5000 -f2 -v?

Here is the output.
out.txt

Thanks, it seems that there are a couple of measurements that are off:

image

Even if they were not off, the current implementation of mctop would not work on such an assymetric topology.

If you are not bored, one more test that you can run is with the manually fixed topology that I described before. In the server.mct file that you shared earlier, replace 908 with 600 and leave the file in the desc folder. Then, you can execute ./mctop -a to get memory latencies and bandwidths. These measurements will show to us whether the assymetry that we see truly exists.

Thanks again and sorry for all these complications!

One of the ideas that I want to implement at some point (so far I didn't have the need) is to have a backup plan if the topology creation fails: Read the topology from the OS and augment it with measurements.

Vasilis.

Here is the output. Seems they are symmetric, right?

latbw.txt

Indeed, they are symmetric, with one weaker link each -- very interesting topology :)

My best "guess" is that the problem is due to:

  1. DVFS -- the latency measurements have a huge spread; and
  2. Due to the low actual difference (as indicated by the mem. latency measurements).

For now, I have made the DVFS handling more aggressive. You could try:
./mctop -f2 -c30 -r5000 -d3

I have some more proper solutions for multi-cores such as this one, but I need to find the time to implement them.

Thanks, once again.

I forgot to mention the -i option of mctop.
./mctop -f2 -c30 -r5000 -d3 -i5 will explicitely try to find a clustering with 5 latency clusters...

Here is data. Please let me know your observation.

Thanks,
Probir

out2.txt
server.mct.txt

Hi Probir,

The "good" news is that mctop did a very reasonable clustering. The bad news is that there are some outlier values that cannot be clustered together. I wrote two scripts to help us with debugging the problem:

You can invoke:

  • ./scripts/ccbench.sh -x13 -y16 and then ./scripts/ccbench.sh -x14 -y16
  • ./scripts/ccbench.sh -x13 -y50 and then ./scripts/ccbench.sh -x13 -y51 and then ./scripts/ccbench.sh -x14 -y50
  • ./scripts/ccbench_map.sh

Essentially, we are measuring some problematic latencies manually, to figure out if it's mctop's problem.

out of scripts:

out_scripts.txt

Well, it is not :-(
Look at the latencies with Node 0:

0 <-> 1      1 <-> 0
526   376    382   536
0 <-> 2      2 <-> 0
611   503    485   583
0 <-> 3      3 <-> 0
492   335    378   547

It is faster for other nodes to receive data from Node 0 than for Node 0 to access other nodes.

Other than that:

   0   <-->   0
105.9
107.3
   0   <-->   1
526.6
376.2
   0   <-->   2
611.6
503.9
   0   <-->   3
492.4
335.2
   1   <-->   0
382.2
536.0
   1   <-->   1
120.6
121.6
   1   <-->   2
843.4
823.6
   1   <-->   3
781.9
728.2
   2   <-->   0
485.6
583.9
   2   <-->   1
824.8
843.0
   2   <-->   2
104.0
106.8
   2   <-->   3
786.1
771.8
   3   <-->   0
378.8
547.9
   3   <-->   1
766.2
812.3
   3   <-->   2
777.0
785.3
   3   <-->   3
97.7

the other nodes are quite reasonably connect.

Bottom line is that the current implementation of mctop does not support this type of asymmetry.