mosdef-hub/mbuild

Wrong atomtypes when saving compound

uppittu11 opened this issue · 3 comments

Bug summary
I recently updated mbuild from a very old version (I don't recall which version) and ran into an issue where the wrong atom types were saved in my hoomdxml output file. I created a test system below to reproduce the issue. The GSD, hoomdxml, and lmp files all have the wrong atom types for the last 10 atoms in the attached output files (mbuild_outputs.zip). This doesn't seem like the intended output, but maybe I'm doing something wrong in this workflow. My mbuild, foyer, and gmso repos are all up to date at the time of this issue.

Is this only on my computer or does anyone else see the same results?

Code to reproduce the behavior

import mbuild as mb
import mdtraj as md

# Save out a ff.xml file for this test case.
# Has 2 atom types: C and H.
ff_file = """
<ForceField>
 <AtomTypes>
 <Type name="TypeA" class="TypeA" element="_TypeA" mass="1" def="[_TypeA]" 
     desc="TypeA atom"/>
 <Type name="TypeB" class="TypeB" element="_TypeB" mass="2" def="[_TypeB]" 
     desc="TypeB atom"/>
 </AtomTypes>
</ForceField>
"""
with open("ff.xml", "w") as f:
    f.write(ff_file)
    
    
# Initialized two atoms
typeA = mb.Compound(name="_TypeA")
typeB = mb.Compound(name="_TypeB")

# Add add one A and B to a container Compound.
subcontainer1 = mb.Compound()
subcontainer1.add(mb.clone(typeA))
subcontainer1.add(mb.clone(typeB))

# Add add only one A to another container Compound.
subcontainer2 = mb.Compound()
subcontainer2.add(mb.clone(typeA))

# Create a container compound with 5 copies of subcontainer1
container1 = mb.Compound()
for _ in range(5):
    container1.add(mb.clone(subcontainer1))

# Create another container compound with 10 copies of subcontainer2
container2 = mb.Compound()
for _ in range(10):
    container2.add(mb.clone(subcontainer2))
    
# Make a top level compound containing both container1 and container2
system = mb.Compound()
system.add(container1)
system.add(container2)


# Save the file and apply the force field
system.save("out.gsd", forcefield_files="ff.xml", overwrite=True,)
system.save("out.hoomdxml", forcefield_files="ff.xml", overwrite=True,)
system.save("out.lmp", forcefield_files="ff.xml", overwrite=True,)

# Compare the atom names from the system in memory and written file 
traj = md.load("out.gsd")
print("Actual\tFrom file")
for atom, atom_saved in zip(system.particles(), traj.top.atoms):
    print(f"{atom.name}\t{atom_saved.name}")

Output:

No urey bradley terms detected, will use angle_style harmonic
Actual	From file
_TypeA	TypeA
_TypeB	TypeB
_TypeA	TypeA
_TypeB	TypeB
_TypeA	TypeA
_TypeB	TypeB
_TypeA	TypeA
_TypeB	TypeB
_TypeA	TypeA
_TypeB	TypeB
_TypeA	TypeA
_TypeA	TypeB
_TypeA	TypeA
_TypeA	TypeB
_TypeA	TypeA
_TypeA	TypeB
_TypeA	TypeA
_TypeA	TypeB
_TypeA	TypeA
_TypeA	TypeB

Software versions

  • Which version of mBuild are you using? (python -c "import mbuild as mb; print(mb.__version__)") : 0.16.1
  • Which version of Python (python --version)? 3.10.10
  • Which operating system? MacOS Ventura (ARM64)

I will test this out and see if I can identify the issue.

I was able to reproduce the issue you have @uppittu11. Turn out this have to due with foyer Forcefield.apply, specifically the use_residue_map option. This option supposed to speed up the atomtyping and parametrization step by grouping residue by name (without checking underlying components), so when you create two compounds container with same name but different underlying particles, it will only atomtype the first one and copy the type info over to the other and cause the issue you posted above. I opened a PR in foyer to change the default of the use_residue_map to False since the user should be aware of the catch when deciding to use that speed up.

To fix your case right now (without having to wait for the foyer PR), you can supply the foyer_kwargs ({"use_residue_map": False}) which will be passed to the atomtyping step.

Setting use_residue_map to false by default is a good quick change. We might add some sort of minimal checking beyond the name, e.g., similar to print_hierarchy where it compares a string representation of some of the underlying characteristics. I feel like a small, extra comparison would not add much computational overhead.