KarypisLab/GKlib

Large 2000x Performance Regression from 5.1 to 5.2

Closed this issue · 0 comments

Using METIS 5.1:

pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
 • Combine mesh parts (0.02s)
 • Construct graph (0.00s)
 • Partition graph (0.01s)
 • Renumber vertices (0.03s)
 • Repartition mesh (0.01s)
 • Write mesh (0.01s)

where the partitioning and renumbering (both of which make calls to METIS_PartGraphRecursive) complete almost immediately. By contrast using METIS 5.2.1:

pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
 • Combine mesh parts (0.01s)
 • Construct graph (0.00s)
 • Partition graph (17.33s)
 • Renumber vertices (7.22s)
 • Repartition mesh (0.01s)
 • Write mesh (0.01s)

where we can see a huge slow down (on the order of ~2000x) for the partition graph portion which makes a single call to METIS_PartGraphRecursive. The inputs are identical in both cases, also reproduced with METIS_PartGraphKway. Also reproduced on both Linux (x86-64) and macOS (AARCH64).

This occurs with all of our grids/meshes. Profiling 5.2.1 with perf record we find:

    17.05%  pyfr      libmetis.so.0                                      [.] libmetis__FM_Mc2WayCutRefine
     8.88%  pyfr      libmetis.so.0                                      [.] libmetis__CreateCoarseGraph
     8.03%  pyfr      libmetis.so.0                                      [.] libmetis__FM_2WayCutRefine
     7.93%  pyfr      libmetis.so.0                                      [.] libmetis__rpqInsert
     5.24%  pyfr      libmetis.so.0                                      [.] libmetis__rpqUpdate
     5.12%  pyfr      libc.so.6                                          [.] random
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__Compute2WayPartitionParams
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__rpqGetTop
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__Match_SHEM
     4.01%  pyfr      libmetis.so.0                                      [.] libmetis__SelectQueue
     2.79%  pyfr      libmetis.so.0                                      [.] libmetis__iset
     2.77%  pyfr      libmetis.so.0                                      [.] libmetis__Project2WayPartition
     2.20%  pyfr      libmetis.so.0                                      [.] libmetis__Match_RM
     1.99%  pyfr      libmetis.so.0                                      [.] libmetis__ComputeLoadImbalanceDiffVe
c
     1.93%  pyfr      libmetis.so.0                                      [.] libmetis__McGeneral2WayBalance
     1.85%  pyfr      libmetis.so.0                                      [.] libmetis__iaxpy
     1.34%  pyfr      libmetis.so.0                                      [.] libmetis__rpqDelete
     1.22%  pyfr      libmetis.so.0                                      [.] libmetis__BucketSortKeysInc

whereas with 5.1 (good) we find:

    10.18%  pyfr      libopenblas64_p-r0-15028c96.3.21.so                [.] blas_thread_server
     9.73%  pyfr      [unknown]                                          [k] 0xffffffff900001a2
     9.30%  pyfr      libc.so.6                                          [.] __sched_yield
     8.22%  pyfr      libpython3.11.so.1.0                               [.] _PyEval_EvalFrameDefault
     1.00%  pyfr      libpython3.11.so.1.0                               [.] 0x0000000000192fb0
     0.96%  pyfr      libpython3.11.so.1.0                               [.] 0x00000000001949c0
     0.80%  pyfr      libmetis.so.0                                      [.] libmetis__FM_Mc2WayCutRefine
     0.59%  pyfr      libpython3.11.so.1.0                               [.] _PyType_Lookup
     0.57%  pyfr      libmetis.so.0                                      [.] libmetis__rpqInsert

where METIS is just a rounding error in the runtime.