t4 lysozyme tutorial with implicit solvent appears to hang at first instance of execute _propagate_replica()
therealchrisneale opened this issue · 0 comments
therealchrisneale commented
Hello,
I find that the following command takes 4 minutes to get to the point of “execute _propagate_replica(0)” but then produces no more output while still consuming CPU resources for more than 10 minutes.
mpiexec.hydra -np 8 yank script --yaml=p-xylene-implicit.yaml
bash-4.2$ head -n 20 nohup.out
Running simulation...
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
2022-05-20 13:01:11,339: Setting 'CpuThreads' to 1 because MPI is active.
2022-05-20 13:01:11,629: Node 1/8: executing <function ExperimentBuilder._check_resume at 0x2b4fbddf40d0>
2022-05-20 13:01:11,631: Node 1/8: waiting for barrier after <function ExperimentBuilder._check_resume at 0x2b4fbddf40d0>
2022-05-20 13:01:11,727: Group 1/8 Node 1/1: execute _setup_molecules(p-xylene)
2022-05-20 13:01:12,583: Fixing net charge from -2.000000000015878e-06 to 4.163336342344337e-17
2022-05-20 13:01:12,595: Node 1/8: waiting for barrier after _setup_molecules
2022-05-20 13:01:12,604: Group 1/8 Node 1/1: execute get_system(t4-xylene)
2022-05-20 13:01:12,606: Setting up the systems for t4-lysozyme and p-xylene using solvent GBSA
2022-05-20 13:01:12,606: Setting up solvent phase
2022-05-20 13:01:13,047: Setting up complex phase
2022-05-20 13:01:13,831: Node 1/8: waiting for barrier after get_system
bash-4.2$ tail -n 20 nohup.out
2022-05-20 13:05:09,532: on stmt: size = arg(0, name=size)
2022-05-20 13:05:09,532: on stmt: $0.1 = global(np: <module 'numpy' from '/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/numpy/__init__.py'>)
2022-05-20 13:05:09,532: on stmt: $0.2 = getattr(value=$0.1, attr=random)
2022-05-20 13:05:09,532: on stmt: $0.3 = getattr(value=$0.2, attr=random)
2022-05-20 13:05:09,532: on stmt: $0.4 = call $0.3(func=$0.3, args=[], kws=(), vararg=None)
2022-05-20 13:05:09,533: on stmt: $0.5 = cast(value=$0.4)
2022-05-20 13:05:09,533: on stmt: return $0.5
2022-05-20 13:05:09,533: defs defaultdict(<class 'list'>,
{'$0.1': [<numba.core.ir.Assign object at 0x2b4fc24cd908>],
'$0.2': [<numba.core.ir.Assign object at 0x2b4fc24cd9e8>],
'$0.3': [<numba.core.ir.Assign object at 0x2b4fc24cdac8>],
'$0.4': [<numba.core.ir.Assign object at 0x2b4fc24cdba8>],
'$0.5': [<numba.core.ir.Assign object at 0x2b4fc24cdc88>],
'size': [<numba.core.ir.Assign object at 0x2b4fc24cd828>]})
2022-05-20 13:05:09,533: SSA violators set()
2022-05-20 13:05:09,862: Mixing of replicas took 0.595s
2022-05-20 13:05:09,862: Accepted 31250/31250 attempted swaps (100.0%)
2022-05-20 13:05:09,862: Node 1/8: waiting for broadcast of <function ReplicaExchangeSampler._mix_replicas at 0x2b4fbbf810d0>
2022-05-20 13:05:09,863: Propagating all replicas...
2022-05-20 13:05:09,863: Node 1/8: execute _propagate_replica(0)
No more output is produced, though top indicates that the processes are still consuming CPU resources
bash-4.2$ date; tail -n 1 nohup.out
Fri May 20 13:17:51 MDT 2022
2022-05-20 13:05:09,863: Node 1/8: execute _propagate_replica(0)
Thank you,
Chris.