Replica state indices array contains integers with large magnitude
zhang-ivy opened this issue · 0 comments
I'm observing an issue with the replica state indices saved in the .nc
file of one of my RBD:ACE2 complex phase repex simulations. For one particular iteration, the replica state indices (which should be between 0 and 35 because there are 36 states in the repex simulation) are very large. I wonder if this is caused by overflow?
I'm using openmm 8, openmmtools 0.21.5, and python 3.10 to run this. I've attached a dump of my env. I can't attach the .nc
file because its too big (even when compressed), but can supply the filepath on lilac (@ijpulidos )
import numpy as np
from openmmtools.multistate import MultiStateReporter, MultiStateSamplerAnalyzer
max_n_iterations = 20000
path = "2_complex.nc"
reporter = MultiStateReporter(path, open_mode='r')
analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
equilibration_data = analyzer._get_equilibration_data()
yields this error:
IndexError Traceback (most recent call last)
Cell In[6], line 6
4 reporter = MultiStateReporter(path, open_mode='r')
5 analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
----> 6 equilibration_data = analyzer._get_equilibration_data()
7 # print(f"n_equilibration_iterations: {n_equilibration_iterations}")
8 # print(f"subsample rate: {subsample_rate}")
File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:2046, in MultiStateSamplerAnalyzer._get_equilibration_data(self, energies, neighborhoods, replica_state_indices)
2043 n_effective_max = (self.max_n_iterations - n_equilibration + 1) / g_t
2045 else:
-> 2046 u_n = self.get_effective_energy_timeseries(energies=energies, neighborhoods=neighborhoods, replica_state_indices=replica_state_indices)
2048 # For SAMS, if there is a second-stage start time, use only the asymptotically optimal data
2049 t0 = self._n_equilibration_iterations if self._n_equilibration_iterations is not None else 1 # if self._n_equilibration_iterations was not specified, discard minimization frame
File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:1458, in MultiStateSamplerAnalyzer.get_effective_energy_timeseries(self, energies, neighborhoods, replica_state_indices)
1455 for iteration in range(n_iterations):
1456 # Slice the current sampled states by those replicas.
1457 states_slice = replica_state_indices[:, iteration]
-> 1458 u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])
1460 # Correct for potentially-changing log weights
1461 if has_log_weights:
IndexError: index 1699962488 is out of bounds for axis 1 with size 36
The error is occurring in get_effective_energy_timeseries()
. When I run the code in that function:
# create effective energy timeseries (copied from MultiStateSamplerAnalyzer.get_effective_energy_timeseries())
energies, _, neighborhoods, replica_state_indices = analyzer._read_energies(truncate_max_n_iterations=True)
n_replicas, n_states, n_iterations = energies.shape
u_n = np.zeros([n_iterations], np.float64)
# Slice of all replicas, have to use this as : is too greedy
replicas_slice = range(n_replicas)
for iteration in range(n_iterations):
# Slice the current sampled states by those replicas.
states_slice = replica_state_indices[:, iteration]
u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])
I find that at iteration 19310, I get the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[7], line 12
10 # Slice the current sampled states by those replicas.
11 states_slice = replica_state_indices[:, iteration]
---> 12 u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])
IndexError: index 1699962488 is out of bounds for axis 1 with size 36
When I examine replica_state_indices[:, 19310]
:
masked_array(data=[ 1699962488, -675782556, 25841912, 867372552,
520093744, 2018181943, 1684362078, 907524192,
134318661, 808579126, 622460928, 1584973570,
1617192275, 1274706784, -1912077942, 3158450,
39983616, 1398700273, 1616929893, -1974732165,
-1307703295, 12337, -318615014, 1699962488,
2069913700, 25840634, 833687048, 436207664,
2028864086, 1684362078, -92577696, 134318667,
808563470, 1511653376, 1584983810, 1617192275],
mask=False,
fill_value=999999)
The values in this array should be between 0 and 35 (because there are 36 states in this simulation), but are instead very large in magnitude.
The replica_state_indices
for all other iterations look fine. For example, here is `replica_state_indices[:, 19311]:
masked_array(data=[14, 4, 31, 2, 10, 21, 19, 20, 7, 28, 22, 26, 13, 3,
24, 16, 27, 33, 25, 23, 30, 18, 11, 35, 32, 15, 17, 29,
34, 0, 6, 12, 5, 9, 1, 8],
mask=False,
fill_value=999999)