graph is not connected, found 0 potential roots
Closed this issue · 16 comments
Anyone know what would cause this error:
graph is not connected, found 0 potential roots
This is using STAT as reported by a user in LLNL/STAT#46. I'm wondering if it is because of the ".bullx" in the FE hostname. Does MRNet think it is a fully qualified hostname? In this case, I don't think "bullx" is the domain name.
I have a vague memory that when we see a "." in the name, we assume it's fully-qualified. Let me see if I can track that down.
Yep, the offending code is in xplat/src/NetUtils.C
NetUtils::GetHostName() lines 293-298. If we find a '.', we only use the substring that precedes it as the host's name.
Would it be possible to modify MRNet to tolerate a "." in the host name?
It's certainly possible to at least add an environment variable that disables the truncation. I'm far enough removed from MRNet use and testing that I would be afraid of disabling it entirely.
A possible quick workaround that avoids needing to change MRNet code is to use IP addresses in the MRNet hosts file.
@MichaelBrim I'm not sure who is supporting MRNet these days. Are you, or is someone else, going to implement the env var?
@lee218llnl I don't think anyone is supporting it per se. I can create a branch that has change, but I'm not in a position to be able to test it.
@MichaelBrim If you can implement this in a branch, I believe @antonl321 may be able to test.
@lee218llnl @antonl321 OK, I have a potential fix in https://github.com/dyninst/mrnet/tree/fix-host-truncation. To test the fix, please set XPLAT_USE_FQDN=1 in the environment when running.
MIke, that appears to work for the STAT user. Thanks! Can this get pushed in to master and perhaps a release? I guess we can also tag the commit in Spack.
ping @MichaelBrim
@antonl321, @lee218llnl is planning to update the MRNet spack package to create a version (5.0.1-4) that includes this fix (See spack/spack#33120). In the meantime until that Spack change gets merged, the current 5.0.1-3 version actually points at master, so a reinstall would probably get it as well.
@antonl321 the Spack PR was merged and is version 5.0.1-4 as Mike indicated. To be explicit, you can add "^mrnet@5.0.1-4" to your spec when doing a spack install of stat.
Err, I got the following error when trying to install with spack
==> Installing mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3
==> No binary for mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3 found: installi
ng from source
==> Warning: Missing a source id for mrnet@5.0.1-4
==> Cannot find version 5.0.1-4 in url_list
==> Error: FetchError: All fetchers failed for spack-stage-mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3
/lus/h1resw03/bm/atosla/shared/Tools/spack-2/lib/spack/spack/package_base.py:1527, in do_fetch:
1524
1525 self.stage.create()
1526 err_msg = None if not self.manual_download else self.download_instr
1527 start_time = time.time()
1528 self.stage.fetch(mirror_only, err_msg=err_msg)
1529 self._fetch_time = time.time() - start_time
did you pull the latest develop branch from spack?
It built ok after I a pull. tested on a small case. I hope to tested at large scale this week or the next one.