mosdef-hub/mbuild

Potential bug with freud_generate_bonds

daico007 opened this issue · 2 comments

Bug summary

Found this bug when working on #1124. When I tried to replicate the model and regenerate bonds, the original Compound.generate_bonds gives different answer from Compound.freud_generate_bonds. When compare with the correct value, the first is correct (generate_bonds) while the latter is 1 short.

Code to reproduce the behavior

Please include a code snippet that can be used to reproduce this bug.

"""Beta-cristobalite surface."""
import itertools
import math

import numpy as np

import mbuild as mb

compound = mb.Compound(name="SilicaInterface")
dimensions = None

betacristabolite = mb.load(
    "beta-cristobalite-expanded.mol2",
    backend="gmso",
)

# Shift it back to origin
xs, ys, zs = list() , list(), list()
for particle in betacristabolite.particles(): 
    xs.append(particle.pos[0])
    ys.append(particle.pos[1])
    zs.append(particle.pos[2])
betacristabolite.translate((-min(xs), -min(ys), -min(zs)))

compound.periodicity = (True, True, False)
# 1.3200 taken from boundingbox length rounded to 4 decimal places
ref_dims = betacristabolite.get_boundingbox().lengths #[5.3888, 4.6669, 1.3200]
if not dimensions:
    box = mb.Box(ref_dims)
elif isinstance(dimensions, (list, tuple)) and len(dimensions) == 2:
    box = mb.Box([dimensions[0], dimensions[1], ref_dims[2]])
elif isinstance(dimensions, (list, tuple)) and len(dimensions) == 3:
    box = mb.Box(dimensions)
else:
    raise ValueError(
        "Unsupported value for dimension provided. "
        "Only accept `None` (default), list/tuple of 2 (x and y dimension) "
        "or list/tuple of 3(x, y, and z dimensions)"
    )

compound.box = box

# Borrowed code from water_box recipe
scale_Lx = math.ceil(box.Lx / ref_dims[0])
scale_Ly = math.ceil(box.Ly / ref_dims[1])
scale_Lz = math.ceil(box.Lz / ref_dims[2])

silica_list = list()
missing = list()
for particle in list(betacristabolite.particles()):
    for i, j, k in itertools.product(
        range(scale_Lx),
        range(scale_Ly),
        range(scale_Lz),
    ):
        shift = np.array(
            [
                i * ref_dims[0],
                j * ref_dims[1],
                k * ref_dims[2],
            ]
        )
        if all(particle.pos + shift <= box.lengths):
            temp = mb.clone(particle)
            temp.translate(shift)
            silica_list.append(temp)
        else: 
            missing.append(particle)
compound.add(silica_list)

count = 0
for particle in list(compound.particles()):
    if particle.name.startswith("O") and particle.pos[2] >= box.Lz:
        count += 1
        port = mb.Port(
            anchor=particle, orientation=[0, 0, 1], separation=0.1
        )
        compound.add(port, "port_{}".format(count))
        particle.name = "O"  # Strip numbers required in .mol2 files.
    elif particle.name.startswith("O"):
        particle.name = "O"
    elif particle.name.startswith("Si"):
        particle.name = "Si"

From here, if the original generate_bonds() is used:

compound.generate_bonds(name_a='Si', name_b='O', dmin=0, dmax=0.17)
assert compound.n_bonds == betacristabolite.n_bonds # True as both  is 2400

If the freud_generate_bonds() is used:

compound.freud_generate_bonds(name_a='Si', name_b='O', dmin=0, dmax=0.17)
assert compound.n_bonds == betacristabolite.n_bonds # False, the first will return 2399 while the latter is 2400

I had this same problem in one of the demos I put together a while ago (same issue, one bond shy). This popped up for me after updating my conda environment at some point. I think it is a freud issue, not necessarily mbuild. I have a really minimal example...let me see if I can find a version of freud that provides the right answer and can submit a bug to them.

After a bit of digging, it is not a freud issue, but rather how we are passing particles to query when we are looking to make bonds between two different species.

For example, consider the following bit of code. A simple cubic lattice of particles (we will label each as carbon for simplicity, albeit not realistic).

import mbuild as mb

carbon_atom = mb.Compound(name='C', element='C')

grid_pattern = mb.Grid3DPattern(3, 3, 3)

grid_pattern.scale([0.45, 0.45, 0.45])
carbon_system_list = grid_pattern.apply(carbon_atom)
carbon_system = mb.Compound(carbon_system_list)
carbon_system.box = mb.Box([1,1,1])

carbon_system.freud_generate_bonds(name_a= "C", name_b="C", 
                                   dmin=0.0, dmax=0.16, exclude_ii=True)

print(carbon_system.n_bonds)

This will yield 54 bonds.

If we take the same basic code and initialize every other atom as oxygen, we can see the problems arise:

carbon_atom = mb.Compound(name='C', element='C')

grid_pattern = mb.Grid3DPattern(3, 3, 3)

grid_pattern.scale([0.45, 0.45, 0.45])
carbon_system_list = grid_pattern.apply(carbon_atom)
carbon_oxygen_system = mb.Compound(carbon_system_list)
carbon_oxygen_system.box = mb.Box([1,1,1])

for i, child in enumerate(carbon_oxygen_system.children): 
    if i%2 == 0:
        child.name='O'
        child.element='O'

carbon_oxygen_system.freud_generate_bonds(name_a= "C", name_b="O", 
                                   dmin=0.0, dmax=0.16, exclude_ii=True)

print(carbon_oxygen_system.n_bonds)

This ends up with 45 bonds. Since the only thing that changed is the labeling, these results should agree. If we change excluded_ii = False, we will recover 54 bonds. This seems counter intuitive because exclude_ii is design to prevent us from accidentally trying to make a bond between a particle and itself.

The issue arises in the code starting on line 1363 in compound.py

        a_indices = []
        b_indices = []
        for i, part in enumerate(self.particles()):
            if part.name == name_a:
                a_indices.append(i)
            if part.name == name_b:
                b_indices.append(i)

        aq = freud.locality.AABBQuery(freud_box, moved_positions[b_indices])


        nlist = aq.query(
            moved_positions[a_indices],
            dict(
                r_min=dmin,
                r_max=dmax,
                exclude_ii=exclude_ii,
            ),
        ).toNeighborList()

In the first example where we only had carbon atoms, the indices added to 'a_indices' and 'b_indices' are identical. As such, the position array (moved_positions) we pass to set up aq and aq.query are identical (thus we need exclude_ii = True). In the second case, these indices are different. However, when we pass them to freud we lose these actual indices since we are passing as "moved_positions[a_indices]". So if we had a system with 10 carbons and 10 oxygens, the indices that end up being internal to freud are going to be from 0 to 9 in both cases. This is why we can recover the correct answer when we set exclude_ii = False.

There are a few options to fix this, but the easiest is probably to remove the ability of the user to toggle exclude_ii, and instead of it dynamically set.

exclude_ii = True when name_a == name_b
otherwise exclude_ii = False

We could also modify this such that we pass aq.query moved_positions[b_indices+a_indices] (with exclude_ii = True), to ensure that the a_indices will be given unique indices inside freud. But we'd only want to do this if name_a != named_b, so it seems simpler to just adjust exclude_ii .