error when trying to use stat-gui -a host:pid
Closed this issue · 28 comments
Hello,
I'm trying to simplify for the users the start of stat-gui with a given slurm job id.
I thought the -a flag of stat-gui is the way to go but when I tested I get the error from below:
$ stat-gui -a ab4-2034:603788
Traceback (most recent call last):
File "/lus/h2resw01/hpcperm/atosla/Tools/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/stat-develop-7kijzqyi64mccgk5753lygvzvpuofbll/lib/python3.9/site-packages/STATmain.py", line 138, in <module>
args.func(args)
File "/lus/h2resw01/hpcperm/atosla/Tools/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/stat-develop-7kijzqyi64mccgk5753lygvzvpuofbll/lib/python3.9/site-packages/STATGUI.py", line 2481, in STATGUI_main
window = STATGUI(args)
File "/lus/h2resw01/hpcperm/atosla/Tools/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/stat-develop-7kijzqyi64mccgk5753lygvzvpuofbll/lib/python3.9/site-packages/STATGUI.py", line 413, in __init__
if args.gdb is not None:
AttributeError: 'Namespace' object has no attribute 'gdb'
I don't understand how
args.gdb is not None
in the script referred above.
regards,
Lucian
The -a option takes the hostname:PID of the srun process or if srun is running on the current host, just the PID. Note this is the PID not the SLURM job ID.
That aside, it shouldn't be giving an error about the gdb namespace, so I will look in to that aspect
ugh, sorry, I need to read the description better. It looks like you are specifying a hostname:PID. I will investigate. Sorry for my mistake earlier.
I think I need to protect the if args.gdb statement with an "if HAVE_GDB_SUPPORT"
I just updated the develop branch. Can you please rebuild and try again?
Hi,
Very strange, I have rebuilt stat and now the lmond don't start, see below stat-gui output.
I have done a git pull
in my spack repsitory before rebuild, probable not a very smart idea.
Even so, it should work.
I use this install command:
spack install stat@develop ^dyninst@10.2.1 ^automake@1.15 ^libffi@3.3
any problem with it?
I'm doing now full rebuild ( install --fresh).
$ /lus/h2resw01/hpcperm/atosla/Tools/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/stat-develop-3a7n7z3j4fkshpx7pw4h6cwjk6j4wr2q/bin/stat-gui
/lus/h2resw01/hpcperm/atosla/Tools/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/python-3.9.12-zx3apzxaxhcu6qpz24pxs4ob7atwvpw6/lib/python3.9/site-packages/gi/overrides/Gtk.py:580: Warning: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
return Gtk.Dialog.run(self, *args, **kwargs)
srun: Job 23339649 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Job 23339649 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 23339649 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 23339649 step creation still disabled, retrying (Requested nodes are busy)
can you make sure that your rm_slurm.conf has the --overlap flag?
[testnewm@corona212 spack]$ cat opt/spack/linux-rhel8-zen2/gcc-10.3.1/launchmon-master-z2j6medmhsr4pzrrilk6yo5m6iiqig6y/etc/rm_slurm.conf | grep overlap
## Aug 13 2021 GLL: Add --overlap to launch_str if configure detects srun version >= 20.11
RM_launch_str= --overlap --input=none --gres=none --mem-per-cpu=0 --jobid=%j --nodes=%n --ntasks=%n --nodelist=%l %d %o --lmonsharedsec=%s --lmonsecchk=%c
I think not, my grep gets ony the comment
cat ./opt/spack/linux-rhel8-zen/gcc-8.4.1/launchmon-master-zy65elfjejscnzmit6jr2ypexlg7lzue/etc/rm_slurm.conf | grep overlap
Aug 13 2021 GLL: Add --overlap to launch_str if configure detects srun version >= 20.11
Should I put it back by hand?
I think it is worth a try, so please add it back by hand and let me know if it works. Can you also send the output of "srun --version"? It looks like you are using a version that I did not flag in the configure script, and maybe I should set the threshold to less than 20.11.
Yep,
that line fixes the srun problem. -a option works as well with the new build.
We had a recent upgrade of slurm on our systems.
Glad to hear that worked! Did you rebuild launchmon after the slurm upgrade? My guess is "no", but I just want to confirm.
no, but I'm running now a fresh rebuild. I'll be back when is ready.
ok, thanks. With the rebuild, launchmon should detect the new slurm version and add the --overlap flag
Please let me know when you get a chance to rebuild launchmon against your new slurm and let me know if it works without any hand modifications. Thanks.
Hi,
I had a look at the fresh rebuild which has a freshly build launchmon ( I checked).
First run of stat-gui failed with xdot error. I have activates py-xdot but no success.
In the error message something was mentioned about the harfbuzz lib. I found out that the fresh build has picked
harfbuzz@4.2.1 while the previous build use harfbuzz@2.9.1.
I reinstalled with ^harfbuzz@2.9.1 and it worked fine.
I'd guess that you need to look a bit into harfbuzz lib dependency in order to make everything smooth.
I just did my own test and stat-gui ran OK for me with harbuzz-4.2.1. Can you send the full error output that you saw?
Hi,
Attached is the ouput of stat-gui when using the fresh build with harfbuzz 4.2.1
I have loaded py-xddot but I get the same error.
Strange enough I don't see any mention of harfbuzz in this try!
In the same file after the error message I have attached the output of spack find -d ...
BTW, spack prints this warning
Warning: spack activate is deprecated in favor of environments and will be removed in v0.19.0
and I'm using v0.19.0 :)
PS I'm on leave for the next two weeks with intermittent access to email.
It looks like a problem finding the xdot module, but there aren't any specifics as to why it is failing to find xdot. If you run your spack's python and try to import xdot, what do you see? This is what I get
$ ./opt/spack/linux-rhel8-zen2/gcc-10.3.1/python-3.9.13-arlupivuhblfnn5obmr2hxi3xd67glqu/bin/python
Python 3.9.13 (main, Jun 10 2022, 12:23:33)
[GCC 10.3.1 20210422 (Red Hat 10.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xdot
(.:1345480): dbind-WARNING **: 09:19:14.222: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-KHLO80AzNa: Connection refused
>>> print(xdot.__file__)
/usr/WS2/testnewm/spack/opt/spack/linux-rhel8-zen2/gcc-10.3.1/python-3.9.13-arlupivuhblfnn5obmr2hxi3xd67glqu/lib/python3.9/site-packages/xdot/__init__.py
Hi,
I have reinstalled stat with
bin/spack install -v stat@develop ^harfbuzz@4.2.1
and the stat-gui starts now but the gui crashes in less than one second with with "Segmentation fault"
My old install command is as follow
pack install -v stat@develop ^dyninst@10.2.1 ^automake@1.15 ^libffi@3.3
are these explicit dependencies still needed?
have you done a git pull for your spack recently? One GUI seg fault was fixe in spack/spack#30595, which adds libffi@:3.3 to glib, to avoid having to explicity specify it on the spack install of stat.
for the record
[atosla@ab6-100 spack]$ bin/spack --version
0.19.0.dev0 (01f8236bf5faca20859ed3c4bb7dd179a4178e18)
build with spack install -v stat@develop ^dyninst@10.2.1 ^automake@1.15 ^libffi@3.3 ^harfbuzz@4.2.1
works fine.
after git pull I got the save version of spack. So I'll stay with this build for the time being.
After a git pull in my spack dir I built stat without ^libffi@3.3
spack find -d ...
shows that libffi@3.4.2
was picked and stat-gui crashes with a two words: "Segmentation fault"
so I'll stay with libffi@3.3.1 for the time being
I close this for now
just out of curiosity, can you see if your var/spack/repos/builtin/packages/gobject-introspection/package.py contains this line:
depends_on('libffi@:3.3', when='@:1.70') # libffi 3.4 caused seg faults
And also check which version of gobject-introspection your stat build is picking up?
I got this
$ grep libffi var/spack/repos/builtin/packages/gobject-introspection/package.py
depends_on('libffi')
depends_on('libffi@:3.3', when='@:1.70') # libffi 3.4 caused seg faults
is the first depend that break things?
[atosla@aa6-100 spack]$ bin/spack find -d /chkrckl | grep gobject
gobject-introspection@1.56.1
py-pygobject@3.38.0
That's strange. The first depends on is required and should not override the specific version dependence in the second depends. Do you perhaps have an older gobject-introspection installation? Perhaps you had installed that prior to me adding the libffi@:3.3 dependence in Spack.
As noted in the mention, I put in another PR in spack. The latest gobject introspection did not get the fix/update that we were expecting, so we need to enforce the libffi version for the latest gobject-introspection 1.72.0. Thanks for reporting this and for your patience. I hope the explicit libffi dependence will suffice in the meantime.
My most recent testing seems to indicate that gobject-introspection 1.72.0 with libffi 3.4.2 works OK for py-xdot. After activating, I did get
rzalastor4{testnewm}33: ./opt/spack/linux-rhel8-ivybridge/gcc-10.3.1/py-xdot-1.2-n4xtla7ncbms4x7h2sfoprbqm4k5klas/bin/xdot
Traceback (most recent call last):
File "/usr/WS2/testnewm/delete/spack/./opt/spack/linux-rhel8-ivybridge/gcc-10.3.1/py-xdot-1.2-n4xtla7ncbms4x7h2sfoprbqm4k5klas/bin/xdot", line 7, in <module>
from xdot.__main__ import main
File "/usr/WS2/testnewm/delete/spack/opt/spack/linux-rhel8-ivybridge/gcc-10.3.1/python-3.9.13-37wyoqx5fqgkgctkg7svipdwy5u2ri6e/lib/python3.9/site-packages/xdot/__init__.py", line 28, in <module>
from . import ui
File "/usr/WS2/testnewm/delete/spack/opt/spack/linux-rhel8-ivybridge/gcc-10.3.1/python-3.9.13-37wyoqx5fqgkgctkg7svipdwy5u2ri6e/lib/python3.9/site-packages/xdot/ui/__init__.py", line 3, in <module>
from .window import DotWidget, DotWindow
File "/usr/WS2/testnewm/delete/spack/opt/spack/linux-rhel8-ivybridge/gcc-10.3.1/python-3.9.13-37wyoqx5fqgkgctkg7svipdwy5u2ri6e/lib/python3.9/site-packages/xdot/ui/window.py", line 25, in <module>
import gi
ModuleNotFoundError: No module named 'gi'
I could work around this by loading the py-xdot module that spack installs:
bash-4.4$ ml py-xdot-1.2-gcc-10.3.1-n4xtla7
bash-4.4$ xdot
and then the GUI pops up.