choderalab/openmmtools

Replica state indices array contains integers with large magnitude

zhang-ivy opened this issue · 0 comments

I'm observing an issue with the replica state indices saved in the .nc file of one of my RBD:ACE2 complex phase repex simulations. For one particular iteration, the replica state indices (which should be between 0 and 35 because there are 36 states in the repex simulation) are very large. I wonder if this is caused by overflow?

I'm using openmm 8, openmmtools 0.21.5, and python 3.10 to run this. I've attached a dump of my env. I can't attach the .nc file because its too big (even when compressed), but can supply the filepath on lilac (@ijpulidos )

import numpy as np
from openmmtools.multistate import MultiStateReporter, MultiStateSamplerAnalyzer

max_n_iterations = 20000
path = "2_complex.nc"
reporter = MultiStateReporter(path, open_mode='r')
analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
equilibration_data = analyzer._get_equilibration_data()

yields this error:

IndexError                                Traceback (most recent call last)
Cell In[6], line 6
      4 reporter = MultiStateReporter(path, open_mode='r')
      5 analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
----> 6 equilibration_data = analyzer._get_equilibration_data()
      7 # print(f"n_equilibration_iterations: {n_equilibration_iterations}")
      8 # print(f"subsample rate: {subsample_rate}")

File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:2046, in MultiStateSamplerAnalyzer._get_equilibration_data(self, energies, neighborhoods, replica_state_indices)
   2043     n_effective_max = (self.max_n_iterations - n_equilibration + 1) / g_t
   2045 else:
-> 2046     u_n = self.get_effective_energy_timeseries(energies=energies, neighborhoods=neighborhoods, replica_state_indices=replica_state_indices)
   2048     # For SAMS, if there is a second-stage start time, use only the asymptotically optimal data
   2049     t0 = self._n_equilibration_iterations if self._n_equilibration_iterations is not None else 1 # if self._n_equilibration_iterations was not specified, discard minimization frame

File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:1458, in MultiStateSamplerAnalyzer.get_effective_energy_timeseries(self, energies, neighborhoods, replica_state_indices)
   1455 for iteration in range(n_iterations):
   1456     # Slice the current sampled states by those replicas.
   1457     states_slice = replica_state_indices[:, iteration]
-> 1458     u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])
   1460     # Correct for potentially-changing log weights
   1461     if has_log_weights:

IndexError: index 1699962488 is out of bounds for axis 1 with size 36

The error is occurring in get_effective_energy_timeseries(). When I run the code in that function:

# create effective energy timeseries (copied from MultiStateSamplerAnalyzer.get_effective_energy_timeseries())
energies, _, neighborhoods, replica_state_indices = analyzer._read_energies(truncate_max_n_iterations=True)
n_replicas, n_states, n_iterations = energies.shape

u_n = np.zeros([n_iterations], np.float64)
# Slice of all replicas, have to use this as : is too greedy
replicas_slice = range(n_replicas)
for iteration in range(n_iterations):
    # Slice the current sampled states by those replicas.
    states_slice = replica_state_indices[:, iteration]
    u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])

I find that at iteration 19310, I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[7], line 12
     10 # Slice the current sampled states by those replicas.
     11 states_slice = replica_state_indices[:, iteration]
---> 12 u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])

IndexError: index 1699962488 is out of bounds for axis 1 with size 36

When I examine replica_state_indices[:, 19310]:

masked_array(data=[ 1699962488,  -675782556,    25841912,   867372552,
                     520093744,  2018181943,  1684362078,   907524192,
                     134318661,   808579126,   622460928,  1584973570,
                    1617192275,  1274706784, -1912077942,     3158450,
                      39983616,  1398700273,  1616929893, -1974732165,
                   -1307703295,       12337,  -318615014,  1699962488,
                    2069913700,    25840634,   833687048,   436207664,
                    2028864086,  1684362078,   -92577696,   134318667,
                     808563470,  1511653376,  1584983810,  1617192275],
             mask=False,
       fill_value=999999)

The values in this array should be between 0 and 35 (because there are 36 states in this simulation), but are instead very large in magnitude.

The replica_state_indices for all other iterations look fine. For example, here is `replica_state_indices[:, 19311]:

masked_array(data=[14,  4, 31,  2, 10, 21, 19, 20,  7, 28, 22, 26, 13,  3,
                   24, 16, 27, 33, 25, 23, 30, 18, 11, 35, 32, 15, 17, 29,
                   34,  0,  6, 12,  5,  9,  1,  8],
             mask=False,
       fill_value=999999)

perses-paper7-explicit.txt.zip