Potential bug with freud_generate_bonds
daico007 opened this issue · 2 comments
Bug summary
Found this bug when working on #1124. When I tried to replicate the model and regenerate bonds, the original Compound.generate_bonds
gives different answer from Compound.freud_generate_bonds
. When compare with the correct value, the first is correct (generate_bonds
) while the latter is 1 short.
Code to reproduce the behavior
Please include a code snippet that can be used to reproduce this bug.
"""Beta-cristobalite surface."""
import itertools
import math
import numpy as np
import mbuild as mb
compound = mb.Compound(name="SilicaInterface")
dimensions = None
betacristabolite = mb.load(
"beta-cristobalite-expanded.mol2",
backend="gmso",
)
# Shift it back to origin
xs, ys, zs = list() , list(), list()
for particle in betacristabolite.particles():
xs.append(particle.pos[0])
ys.append(particle.pos[1])
zs.append(particle.pos[2])
betacristabolite.translate((-min(xs), -min(ys), -min(zs)))
compound.periodicity = (True, True, False)
# 1.3200 taken from boundingbox length rounded to 4 decimal places
ref_dims = betacristabolite.get_boundingbox().lengths #[5.3888, 4.6669, 1.3200]
if not dimensions:
box = mb.Box(ref_dims)
elif isinstance(dimensions, (list, tuple)) and len(dimensions) == 2:
box = mb.Box([dimensions[0], dimensions[1], ref_dims[2]])
elif isinstance(dimensions, (list, tuple)) and len(dimensions) == 3:
box = mb.Box(dimensions)
else:
raise ValueError(
"Unsupported value for dimension provided. "
"Only accept `None` (default), list/tuple of 2 (x and y dimension) "
"or list/tuple of 3(x, y, and z dimensions)"
)
compound.box = box
# Borrowed code from water_box recipe
scale_Lx = math.ceil(box.Lx / ref_dims[0])
scale_Ly = math.ceil(box.Ly / ref_dims[1])
scale_Lz = math.ceil(box.Lz / ref_dims[2])
silica_list = list()
missing = list()
for particle in list(betacristabolite.particles()):
for i, j, k in itertools.product(
range(scale_Lx),
range(scale_Ly),
range(scale_Lz),
):
shift = np.array(
[
i * ref_dims[0],
j * ref_dims[1],
k * ref_dims[2],
]
)
if all(particle.pos + shift <= box.lengths):
temp = mb.clone(particle)
temp.translate(shift)
silica_list.append(temp)
else:
missing.append(particle)
compound.add(silica_list)
count = 0
for particle in list(compound.particles()):
if particle.name.startswith("O") and particle.pos[2] >= box.Lz:
count += 1
port = mb.Port(
anchor=particle, orientation=[0, 0, 1], separation=0.1
)
compound.add(port, "port_{}".format(count))
particle.name = "O" # Strip numbers required in .mol2 files.
elif particle.name.startswith("O"):
particle.name = "O"
elif particle.name.startswith("Si"):
particle.name = "Si"
From here, if the original generate_bonds()
is used:
compound.generate_bonds(name_a='Si', name_b='O', dmin=0, dmax=0.17)
assert compound.n_bonds == betacristabolite.n_bonds # True as both is 2400
If the freud_generate_bonds()
is used:
compound.freud_generate_bonds(name_a='Si', name_b='O', dmin=0, dmax=0.17)
assert compound.n_bonds == betacristabolite.n_bonds # False, the first will return 2399 while the latter is 2400
I had this same problem in one of the demos I put together a while ago (same issue, one bond shy). This popped up for me after updating my conda environment at some point. I think it is a freud issue, not necessarily mbuild. I have a really minimal example...let me see if I can find a version of freud that provides the right answer and can submit a bug to them.
After a bit of digging, it is not a freud issue, but rather how we are passing particles to query when we are looking to make bonds between two different species.
For example, consider the following bit of code. A simple cubic lattice of particles (we will label each as carbon for simplicity, albeit not realistic).
import mbuild as mb
carbon_atom = mb.Compound(name='C', element='C')
grid_pattern = mb.Grid3DPattern(3, 3, 3)
grid_pattern.scale([0.45, 0.45, 0.45])
carbon_system_list = grid_pattern.apply(carbon_atom)
carbon_system = mb.Compound(carbon_system_list)
carbon_system.box = mb.Box([1,1,1])
carbon_system.freud_generate_bonds(name_a= "C", name_b="C",
dmin=0.0, dmax=0.16, exclude_ii=True)
print(carbon_system.n_bonds)
This will yield 54 bonds.
If we take the same basic code and initialize every other atom as oxygen, we can see the problems arise:
carbon_atom = mb.Compound(name='C', element='C')
grid_pattern = mb.Grid3DPattern(3, 3, 3)
grid_pattern.scale([0.45, 0.45, 0.45])
carbon_system_list = grid_pattern.apply(carbon_atom)
carbon_oxygen_system = mb.Compound(carbon_system_list)
carbon_oxygen_system.box = mb.Box([1,1,1])
for i, child in enumerate(carbon_oxygen_system.children):
if i%2 == 0:
child.name='O'
child.element='O'
carbon_oxygen_system.freud_generate_bonds(name_a= "C", name_b="O",
dmin=0.0, dmax=0.16, exclude_ii=True)
print(carbon_oxygen_system.n_bonds)
This ends up with 45 bonds. Since the only thing that changed is the labeling, these results should agree. If we change excluded_ii = False, we will recover 54 bonds. This seems counter intuitive because exclude_ii is design to prevent us from accidentally trying to make a bond between a particle and itself.
The issue arises in the code starting on line 1363 in compound.py
a_indices = []
b_indices = []
for i, part in enumerate(self.particles()):
if part.name == name_a:
a_indices.append(i)
if part.name == name_b:
b_indices.append(i)
aq = freud.locality.AABBQuery(freud_box, moved_positions[b_indices])
nlist = aq.query(
moved_positions[a_indices],
dict(
r_min=dmin,
r_max=dmax,
exclude_ii=exclude_ii,
),
).toNeighborList()
In the first example where we only had carbon atoms, the indices added to 'a_indices' and 'b_indices' are identical. As such, the position array (moved_positions) we pass to set up aq and aq.query are identical (thus we need exclude_ii = True). In the second case, these indices are different. However, when we pass them to freud we lose these actual indices since we are passing as "moved_positions[a_indices]". So if we had a system with 10 carbons and 10 oxygens, the indices that end up being internal to freud are going to be from 0 to 9 in both cases. This is why we can recover the correct answer when we set exclude_ii = False.
There are a few options to fix this, but the easiest is probably to remove the ability of the user to toggle exclude_ii, and instead of it dynamically set.
exclude_ii = True when name_a == name_b
otherwise exclude_ii = False
We could also modify this such that we pass aq.query moved_positions[b_indices+a_indices] (with exclude_ii = True), to ensure that the a_indices will be given unique indices inside freud. But we'd only want to do this if name_a != named_b, so it seems simpler to just adjust exclude_ii .