dyninst/mrnet

graph is not connected, found 0 potential roots

Closed this issue · 16 comments

Anyone know what would cause this error:

graph is not connected, found 0 potential roots

This is using STAT as reported by a user in LLNL/STAT#46. I'm wondering if it is because of the ".bullx" in the FE hostname. Does MRNet think it is a fully qualified hostname? In this case, I don't think "bullx" is the domain name.

I have a vague memory that when we see a "." in the name, we assume it's fully-qualified. Let me see if I can track that down.

Yep, the offending code is in xplat/src/NetUtils.C NetUtils::GetHostName() lines 293-298. If we find a '.', we only use the substring that precedes it as the host's name.

Would it be possible to modify MRNet to tolerate a "." in the host name?

It's certainly possible to at least add an environment variable that disables the truncation. I'm far enough removed from MRNet use and testing that I would be afraid of disabling it entirely.

A possible quick workaround that avoids needing to change MRNet code is to use IP addresses in the MRNet hosts file.

@MichaelBrim I'm not sure who is supporting MRNet these days. Are you, or is someone else, going to implement the env var?

@lee218llnl I don't think anyone is supporting it per se. I can create a branch that has change, but I'm not in a position to be able to test it.

@MichaelBrim If you can implement this in a branch, I believe @antonl321 may be able to test.

@lee218llnl @antonl321 OK, I have a potential fix in https://github.com/dyninst/mrnet/tree/fix-host-truncation. To test the fix, please set XPLAT_USE_FQDN=1 in the environment when running.

MIke, that appears to work for the STAT user. Thanks! Can this get pushed in to master and perhaps a release? I guess we can also tag the commit in Spack.

@antonl321, @lee218llnl is planning to update the MRNet spack package to create a version (5.0.1-4) that includes this fix (See spack/spack#33120). In the meantime until that Spack change gets merged, the current 5.0.1-3 version actually points at master, so a reinstall would probably get it as well.

@antonl321 the Spack PR was merged and is version 5.0.1-4 as Mike indicated. To be explicit, you can add "^mrnet@5.0.1-4" to your spec when doing a spack install of stat.

Err, I got the following error when trying to install with spack

==> Installing mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3
==> No binary for mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3 found: installi
ng from source
==> Warning: Missing a source id for mrnet@5.0.1-4
==> Cannot find version 5.0.1-4 in url_list
==> Error: FetchError: All fetchers failed for spack-stage-mrnet-5.0.1-4-pl3a5wgetpgbazxjql7iu3rfpuup2ai3

/lus/h1resw03/bm/atosla/shared/Tools/spack-2/lib/spack/spack/package_base.py:1527, in do_fetch:
1524
1525 self.stage.create()
1526 err_msg = None if not self.manual_download else self.download_instr

1527 start_time = time.time()
1528 self.stage.fetch(mirror_only, err_msg=err_msg)
1529 self._fetch_time = time.time() - start_time

did you pull the latest develop branch from spack?

It built ok after I a pull. tested on a small case. I hope to tested at large scale this week or the next one.