cea-hpc/clustershell

Performance on large many-dimensional nodesets

mattaezell opened this issue · 2 comments

HPE Cray EX supercomputers use hardware locations (called xnames) that encode up to 5 dimensions. An example compute node might be x1000c2s3b0n1. We don't use xnames for our compute nodes, but our switches, bmcs, chassis controllers, etc do use them. Some of our local tooling (and clush/cluset) struggle with long lists of xnames, particularly when folding.

[root@admin1.borg ~]# time (cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | cluset -f)
x[2000-2073]c[0-7]s[0-7]b[0-1]

real    6m38.483s
user    6m38.228s
sys     0m0.088s

Any thoughts on how we could improve this? Thanks

Thanks for the report @mattaezell.

A quick look (on the master branch) shows that most of the time is spent in RangeSetND._fold_multivariate_merge() which does the nD folding:

$ cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | python3 -m cProfile -s cumulative lib/ClusterShell/CLI/Nodeset.py -f
x[2000-2073]c[0-7]s[0-7]b[0-1]
         1225113449 function calls (1073087573 primitive calls) in 848.288 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    107/1    0.009    0.000  848.388  848.388 {built-in method exec}
        1    0.000    0.000  848.388  848.388 Nodeset.py:27(<module>)
        1    0.001    0.001  848.258  848.258 Nodeset.py:334(main)
        1    0.000    0.000  848.257  848.257 Nodeset.py:155(nodeset)
        1    0.251    0.251  848.255  848.255 Nodeset.py:44(process_stdin)
    18950    0.028    0.000  846.483    0.045 RangeSet.py:884(inner)
152119181/94356   14.296    0.000  845.921    0.009 {built-in method len}
     9473    0.008    0.000  845.920    0.089 RangeSet.py:1147(_fold)
        3    0.000    0.000  845.911  281.970 NodeSet.py:238(__len__)
        3    0.000    0.000  845.911  281.970 RangeSet.py:926(__len__)
        1    0.000    0.000  845.856  845.856 RangeSet.py:1180(_fold_multivariate)
        1  186.997  186.997  844.685  844.685 RangeSet.py:1245(_fold_multivariate_merge)    <<<
 71670778   86.067    0.000  333.553    0.000 RangeSet.py:533(copy)
 47701583   31.013    0.000  331.814    0.000 RangeSet.py:582(__and__)
 47701583   33.374    0.000  287.534    0.000 RangeSet.py:591(intersection)
 71746554  115.193    0.000  176.797    0.000 RangeSet.py:106(__init__)
 23855527   18.540    0.000  173.677    0.000 RangeSet.py:564(__or__)
 23855527   16.714    0.000  147.369    0.000 RangeSet.py:573(union)
 51414608   57.401    0.000  144.083    0.000 RangeSet.py:542(__eq__)
414319969  110.208    0.000  110.208    0.000 {built-in method isinstance}
 95526306   67.220    0.000   91.474    0.000 RangeSet.py:736(update)
 50787459   38.299    0.000   72.265    0.000 RangeSet.py:652(issubset)
 52033619   17.782    0.000   34.702    0.000 RangeSet.py:676(_binary_sanity_check)
 47701583   31.294    0.000   31.294    0.000 RangeSet.py:703(intersection_update)
 71784438   19.991    0.000   19.991    0.000 RangeSet.py:244(set_autostep)
   623080    0.775    0.000    2.092    0.000 RangeSet.py:670(__gt__)
    18947    0.021    0.000    1.969    0.000 NodeSet.py:1508(update)
     9474    0.041    0.000    1.760    0.000 NodeSet.py:1202(__init__)
    18947    0.031    0.000    1.339    0.000 NodeSet.py:789(parse)
     9472    0.061    0.000    1.296    0.000 NodeSet.py:810(parse_string)
    28419    0.043    0.000    0.982    0.000 NodeSet.py:539(update)
    37889    0.074    0.000    0.959    0.000 NodeSet.py:490(_add)
   623080    0.481    0.000    0.809    0.000 RangeSet.py:657(issuperset)
    18944    0.042    0.000    0.735    0.000 NodeSet.py:996(_scan_string)
        2    0.000    0.000    0.640    0.320 RangeSet.py:1126(_sort)
        2    0.061    0.031    0.640    0.320 {method 'sort' of 'list' objects}
     9472    0.190    0.000    0.584    0.000 NodeSet.py:962(_scan_string_single)
     9473    0.033    0.000    0.579    0.000 RangeSet.py:1128(rgveckeyfunc)
    18946    0.039    0.000    0.534    0.000 RangeSet.py:898(copy)
        1    0.011    0.011    0.531    0.531 RangeSet.py:1190(_fold_multivariate_expand)
    75776    0.154    0.000    0.526    0.000 RangeSet.py:194(fromone)

We'll investigate.

Thanks for the super-quick patch for this. Testing seems to work MUCH better.

[root@admin1.borg ~]# time (cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | cluset -f)
x[2000-2073]c[0-7]s[0-7]b[0-1]

real    0m2.849s
user    0m2.956s
sys     0m0.084s