lammps/lammps

[BUG] Force spike for some ReaxFF parametrizations for bond order ~10^-8

mkanski opened this issue ยท 18 comments

Summary

The current implementation of ReaxFF in LAMMPS causes a huge spike in force when bond order is about 10^-8 for some parametrizations.

LAMMPS Version and Platform

Current develop

Expected Behavior

The force between atoms should be monotonic (more or less) when bond order approaches zero.

Actual Behavior

The force increases by orders of magnitude when bond order between atoms is close to 10^-8 for some parametrizations. This causes a sudden increase of kinetic energy when atoms move apart. This bug is not always triggered because the range of distances where the force spikes is quite narrow (and parametrization-specific).

Steps to Reproduce

Use the input script and the ReaxFF parametrization attached at the end.

Further Information, Files, and Links

The graph below shows the plot of absolute value of force (so it can be in log scale) vs bond length. As you can see, the force increases to (almost) infinity for a specific interatomic distance.

image

The issue comes from this line:

DlpVi = 1.0 / (Delta_lpcorr + sbp_i->valency + 1e-8);

For very weak bonds, Delta_lpcorr + sbp_i->valency is close to -1e-8 which causes DlpVi to approach infinity.

I believe this part of code needs to be rewritten, so that the force from the overcoordination term is zeroed in case of really weakly bonded atoms.

Files:

Input file:

atom_style      full
units           real

region          u sphere 0 0 0 30
create_box      2 u

mass            1 197
mass            2 12.011

create_atoms    1 single 0 0 1.4782238
group           move id 1
create_atoms    2 single 0 0 0

pair_style      reaxff NULL checkqeq no
pair_coeff      * * ffield.reax.Monti Au C

variable        fz equal abs(fz[1])
variable        r equal z[1]

thermo_style    custom v_r v_fz pe
thermo          1

fix             move move move linear 0 0 1
timestep        0.002

run             2400

Parametrization:

Reactive MD-force field for CYS on Gold 2015
39 ! Number of general parameters
 50.0000 !Overcoordination parameter
 9.5469 !Overcoordination parameter
 1.6725 !Valency angle conjugation parameter
 1.7224 !Triple bond stabilisation parameter
 6.8702 !Triple bond stabilisation parameter
 60.4850 !C2-correction
 1.0588 !Undercoordination parameter
 4.6000 !Triple bond stabilisation parameter
 12.1176 !Undercoordination parameter
 13.3056 !Undercoordination parameter
 -40.0000 !Triple bond stabilization energy
 0.0000 !Lower Taper-radius
 10.0000 !Upper Taper-radius
 2.8793 !Not used
 33.8667 !Valency undercoordination
 6.0891 !Valency angle/lone pair parameter
 1.0563 !Valency angle
 2.0384 !Valency angle parameter
 6.1431 !Not used
 6.9290 !Double bond/angle parameter
 0.3989 !Double bond/angle parameter: overcoord
 3.9954 !Double bond/angle parameter: overcoord
 -2.4837 !Not used
 5.7796 !Torsion/BO parameter
 10.0000 !Torsion overcoordination
 1.9487 !Torsion overcoordination
 -1.2327 !Conjugation 0 (not used)
 2.1645 !Conjugation
 1.5591 !vdWaals shielding
 0.1000 !Cutoff for bond order (*100)
 1.7602 !Valency angle conjugation parameter
 0.6991 !Overcoordination parameter
 50.0000 !Overcoordination parameter
 1.8512 !Valency/lone pair parameter
 0.5000 !Not used
 20.0000 !Not used
 5.0000 !Molecular energy (not used)
 0.0000 !Molecular energy (not used)
 0.7903 !Valency angle conjugation parameter
 2 ! Nr of atoms; cov.r; valency;a.m;Rvdw;Evdw;gammaEEM;cov.r2;#
 alfa;gammavdW;valency;Eunder;Eover;chiEEM;etaEEM;n.u.
 cov r3;Elp;Heat inc.;n.u.;n.u.;n.u.;n.u.
 ov/un;val1;n.u.;val3,vval4
 C      1.3817   4.0000  12.0000   1.8903   0.1838   0.7028   1.1341    4.0000
        9.7559   2.1346   4.0000  34.9350  79.5548   5.2323   6.0000    0.0000
        1.2114   0.0000 202.2908   8.9539  34.9289  13.5366   0.8563    0.0000
       -2.8983   2.5000   1.0564   4.0000   2.9663   0.0000   0.0000    0.0000
 Au     2.0271   1.0000 196.9665   2.2078   0.3446   0.5126  -1.0000    1.0000
       11.9754   2.0434   1.0000   0.0000   0.0000   1.0082   8.9305    0.0000
       -1.0000   0.0000  92.5070   6.2293   5.2294   0.1542   0.8563    0.0000
       -1.0000   2.9867   1.0338   6.2998   2.5791   0.0000   0.0000    0.0000
 3 ! Nr of bonds; Edis1;LPpen;n.u.;pbe1;pbo5;13corr;pbo6
 pbe2;pbo3;pbo4;Etrip;pbo1;pbo2;ovcorr
 1 1  158.2004  99.1897  78.0000  -0.7738  -0.4550   1.0000  37.6117    0.4147
        0.4590  -0.1000   9.1628   1.0000  -0.0777   6.7268   1.0000    0.0000
 1 2   85.5942   0.0000   0.0000   0.3650  -0.2000   0.0000  16.0000    0.1488
        0.1726  -0.2000  15.0000   1.0000  -0.1380   5.9607   0.0000    0.0000
 2 2  146.6542   0.0000   0.0000  -0.0295  -0.2000   0.0000  16.0000    0.3319
        0.2793  -0.2000  15.0000   1.0000  -0.1591   5.3892   0.0000    0.0000
 1 ! Nr of off-diagonal terms; Ediss;Ro;gamma;rsigma;rpi;rpi2
 1 2    0.1910   2.0244   9.6929   1.8040  -1.0000  -1.0000
 3  ! Nr of angles;at1;at2;at3;Thetao,o;ka;kb;pv1;pv2
 1 1 1 59.0573  30.7029   0.7606   0.0000   0.7180   6.2933   1.1244
 1 1 2 58.3918  13.9641   2.0300   0.0000   1.2404   0.0000   2.2787
 1 2 1 71.3861   3.9232   2.1478   0.0000   1.1259   0.0000   2.1341
 1 ! Nr of torsions;at1;at2;at3;at4;;V1;V2;V3;V2(BO);vconj;n.u;n
 1 1 1 1 -0.2500 34.7453 0.0288 -6.3507 -1.6000 0.0000 0.0000
 0 ! Nr of hydrogen bonds;at1;at2;at3;Rhb;Dehb;vhb1

Thanks for reporting this issue. We will look into it and get back to you, hopefully soon :)

It would be interesting to know if this also happens with the Kokkos version.

It would be interesting to know if this also happens with the Kokkos version.

Yes it would since the two versions are numerically equivalent: https://github.com/lammps/lammps/blob/develop/src/KOKKOS/pair_reaxff_kokkos.cpp#L2351.

Sorry for the radio silence.

We made an investigation into this issue. It seems that our implementation (in ReaxC and PuReMD) matches the original Fortran code by Dr. Adri van Duin in regards to this energy term.

I should clarify here that the expected behavior here is not for the force between atoms to monotonically go to zero. Since this is the lone pair penalty term, the general shape of the force/energy term is as expected with the exception of the spike and the discontinuity immediately after the spike.

As discussed above, this is because the total bond order attains a negative value, i.e. close to -1e-8. This obviously should not happen, but it was not immediately clear to us why it happens. It could be that values outside the physical range have been used for certain bond order related parameters (note that this force field has been developed independently by Monti et al.), but we are not sure which parameters and what should be their ranges. Or the issue could be stemming from the numerical thresholding and corrections applied to bond orders, but it was not easy for us to track the actual code path, and even if we managed to, it was not clear to us how to modify the code to fix this issue.

We notified Dr. van Duin of this issue and will provide a fix/update when we hear back.

@hasanmetin any updates?

Got distracted with other things. We are looking into it again.

We have not been able to reproduce the issue raised here. We ran the input files given here, used update 2 for the LAMMPS stable release from August 2, 2023. Compilers and libraries used for building LAMMPS were GCC/11.2.0, OpenMPI/4.1.1-CUDA-11.8.0, CUDA/11.8.0. We ran the testcase for both the serial and Kokkos builds. The relevant piece of the output from the serial version is given below. As can be seen, there are no spikes.

3.4002238 2.0734264 -2.7987191
3.4022238 2.0316915 -2.7946141
3.4042238 1.9905972 -2.7905919
3.4062238 1.9501379 -2.7866513
3.4082238 1.9103084 -2.7827909
3.4102238 1.8711033 -2.7790096
3.4122238 1.8325174 -2.7753061
3.4142238 1.7945455 -2.7716792
3.4162238 1.7571825 -2.7681275
3.4182238 1.7204235 -2.76465
3.4202238 1.6842636 -2.7612454
3.4222238 1.6486981 -2.7579126
3.4242238 1.6137223 -2.7546503
3.4262238 1.5793319 -2.7514573
3.4282238 1.5455226 -2.7483325
3.4302238 1.5122904 -2.7452748
3.4322238 1.4796316 -2.742283
3.4342238 1.4475428 -2.7393559
3.4362238 1.4160208 -2.7364924
3.4382238 1.3850631 -2.7336915
3.4402238 1.3546676 -2.7309518
3.4422238 1.3248326 -2.7282724
3.4442238 1.2955576 -2.7256521
3.4462238 1.2668427 -2.7230898
3.4482238 1.2386894 -2.7205844
3.4502238 1.2111006 -2.7181347
3.4522238 1.1840811 -2.7157396
3.4542238 1.1576384 -2.713398
3.4562238 1.1317834 -2.7111086
3.4582238 1.1065315 -2.7088704
3.4602238 1.081905 -2.7066821
3.4622238 1.0579357 -2.7045424
3.4642238 1.0346703 -2.7024499
3.4662238 1.0121791 -2.7004032
3.4682238 0.99057254 -2.6984006
3.4702238 0.97003673 -2.6964402
3.4722238 0.95091729 -2.6945195
3.4742238 0.9339718 -2.6926351
3.4762238 0.92149837 -2.6907808
3.4782238 0.98072492 -2.6889344
3.4802238 0.46199919 -2.6898625
3.4822238 0.45788697 -2.6907824
3.4842238 0.45379965 -2.6916941
3.4862238 0.44973711 -2.6925976
3.4882238 0.44569921 -2.693493
3.4902238 0.44168581 -2.6943804
3.4922238 0.43769681 -2.6952598
3.4942238 0.43373205 -2.6961312
3.4962238 0.42979143 -2.6969947
3.4982238 0.4258748 -2.6978504
3.5002238 0.42198204 -2.6986982
3.5022238 0.41811304 -2.6995383
3.5042238 0.41426765 -2.7003707

I should add that the default compiler options were used for building LAMMPS. I suspect the issue observed here is due to the compiler options used. @mkanski Could you please share the compiler and options you used to build LAMMPS?

I can reproduce this issue with static binaries for the LAMMPS version you mentioned (link: https://github.com/lammps/lammps/releases/download/stable_2Aug2023_update2/lammps-linux-x86_64-2Aug2023_update2.tar.gz) on two separate computers.

In my original test, I didn't changed any compiler options, I compiled LAMMPS by these commands:

cmake -D PKG_REAXFF=yes -D PKG_MOLECULE=yes ../cmake
make 

Output of lmp -h is as follows (for more recent LAMMPS, which also reproduces the issue):

Large-scale Atomic/Molecular Massively Parallel Simulator - 7 Feb 2024 - Development
Git info (develop / patch_7Feb2024_update1-275-g207a14f351)

Usage example: ./lmp -var t 300 -echo screen -in in.alloy

List of command line options supported by this LAMMPS executable:

-echo none/screen/log/both  : echoing of input script (-e)
-help                       : print this help message (-h)
-in none/filename           : read input from file or stdin (default) (-i)
-kokkos on/off ...          : turn KOKKOS mode on or off (-k)
-log none/filename          : where to send log output (-l)
-mdi '<mdi flags>'          : pass flags to the MolSSI Driver Interface
-mpicolor color             : which exe in a multi-exe mpirun cmd (-m)
-cite                       : select citation reminder style (-c)
-nocite                     : disable citation reminder (-nc)
-nonbuf                     : disable screen/logfile buffering (-nb)
-package style ...          : invoke package command (-pk)
-partition size1 size2 ...  : assign partition sizes (-p)
-plog basename              : basename for partition logs (-pl)
-pscreen basename           : basename for partition screens (-ps)
-restart2data rfile dfile ... : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile ... 
                            : convert restart to dump file (-r2dump)
-reorder topology-specs     : processor reordering (-r)
-screen none/filename       : where to send screen output (-sc)
-skiprun                    : skip loops in run and minimize (-sr)
-suffix gpu/intel/kk/opt/omp: style suffix to apply (-sf)
-var varname value          : set index style variable (-v)

OS: Linux "Linux Mint 21.2" 6.2.0-39-generic x86_64

Compiler: GNU C++ 11.4.0 with OpenMP 4.5
C++ standard: C++11
MPI v3.1: Open MPI v4.1.4, package: Open MPI michal@IFUJ-PC Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022

Accelerator configuration:


FFT information:

FFT precision  = double
FFT engine  = mpiFFT
FFT library = KISS

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint):   32-bit
sizeof(bigint):   64-bit

Available compression formats:

Extension: .gz     Command: gzip
Extension: .bz2    Command: bzip2
Extension: .zst    Command: zstd
Extension: .xz     Command: xz
Extension: .lzma   Command: xz
Extension: .lz4    Command: lz4


Installed packages:

MOLECULE REAXFF 

Moreover, I'm getting different values of force energy than you:

3.4722238      1.019615      -6.1193348
3.4742238      1.1033095     -6.1172324
3.4762238      1.6329903     -6.1146779
3.4782238      3917221.2     -2.7478382
3.4802238      0.46199919    -2.6898625
3.4822238      0.45788697    -2.6907824
3.4842238      0.45379965    -2.6916941
3.4862238      0.44973711    -2.6925976
3.4882238      0.44569921    -2.693493

Just FYI, when the issue was originally created I've ran this myself for both serial and kokkos and got the same. (all vanilla CMake, nothing special). I even recreated the pretty picture...

Interesting.. looks like we are using basically the same compiler (GCC 11.4 vs GCC 11.2) and same compiler options. So is this due to the CPU architecture then? We will test it on another machine and see what we get. I also would like to try this on our stand-alone ReaxFF implementation (https://github.com/msu-sparta/PuReMD). Could you please let me know exactly at which distance you are seeing the large force value (over 1,000,000)?

Hi @mkanski, I'm one of the people working with @hasanmetin on this.

The builds and tests which I ran (and which did not reproduce the issue) were on the MSU High Performance Computing Center. Specifically, the builds use the commit tagged for the Update 2 for Stable release 2 August 2023 (27e8d0f). Build instructions are below.

module purge && module loadGCC/11.2.0 OpenMPI/4.1.1-CUDA-11.8.0 CUDA/11.8.0 CMake/3.22.1
git clone https://github.com/lammps/lammps.git
cd lammps
mkdir build_mpi_kokkos_cuda_broadwell_kepler_k80_stable_2Aug2023_update2 && cd build_mpi_kokkos_cuda_broadwell_kepler_k80_stable_2Aug2023_update2
cmake -C ../cmake/presets/gcc.cmake -DCMAKE_CXX_FLAGS_DEBUG="-Wall -Og -g -std=c++14" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-g -O2 -DNDEBUG -std=c++14" -DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -std=c++14" -DBUILD_OMP=yes -DBUILD_MPI=yes -DPKG_KOKKOS=yes -DKokkos_ENABLE_SERIAL=yes -DKokkos_ENABLE_OPENMP=yes -DKokkos_ENABLE_CUDA=yes -DKokkos_ARCH_BDW=yes -DKokkos_ARCH_KEPLER37=yes -DKokkos_ENABLE_DEPRECATION_WARNINGS=no -DCMAKE_CXX_COMPILER=${HOME}/lammps/lib/kokkos/bin/nvcc_wrapper -DPKG_REAXFF=yes -DPKG_QEQ=yes -DPKG_MOLECULE=yes ../cmake

The reason for the targeted CPU / GPU architectures with the Kokkos build is that we have many different clusters at MSU HPCC (see https://docs.icer.msu.edu/Cluster_Resources/). Specifically, the tests were run on the Intel16 nodes (2x Intel Xeon E5-2680 v4 (Broadwell) CPU, 8x NVIDIA Kepler K80 GPU).

Based on the build instructions, two thoughts come to mind:

  1. I'm a bit surprised that your build instructions are not enabling the QEq package. Can you trying retesting with this enabled?
  2. The CMake presets for GCC and the C++14 standard also jump out as potential issues. Can you also look into using these?

Also, @mkanski, what hardware are your tests failing on? This is with the non-Kokkos CPU version, correct?

@hasanmetin
The distance when the large force exists is for example 3.4782238. More generally, the issue occurs when the bond order is between about 1e-8 and 1e-10 (the point when the bonded interactions are turned off completely). Below I'm attaching a plot with much finer dr:

image

@ohearnk
Hi,
Thanks for looking into this issue. I cannot follow your instructions precisely because I don't have access to a cluster with the specified modules. On my current machine I have:

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Open MPI 4.1.4
Cuda compilation tools, release 11.5, V11.5.119

I had to modify cmake line as follows (I had to change the GPU architecture and nvcc_wrapper path):

cmake -C ../cmake/presets/gcc.cmake -DCMAKE_CXX_FLAGS_DEBUG="-Wall -Og -g -std=c++14" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-g -O2 -DNDEBUG -std=c++14" -DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -std=c++14" -DBUILD_OMP=yes -DBUILD_MPI=yes -DPKG_KOKKOS=yes -DKokkos_ENABLE_SERIAL=yes -DKokkos_ENABLE_OPENMP=yes -DKokkos_ENABLE_CUDA=yes -DKokkos_ARCH_BDW=yes -DKokkos_ARCH_AMPERE86=yes -DKokkos_ENABLE_DEPRECATION_WARNINGS=no -DCMAKE_CXX_COMPILER=$(pwd)/../lib/kokkos/bin/nvcc_wrapper -DPKG_REAXFF=yes -DPKG_QEQ=yes -DPKG_MOLECULE=yes ../cmake |

With the new settings the issue still persists, both on CPU (without KOKKOS) and GPU (with KOKKOS). I would also like to point out that I can reproduce this with the static executable provided by LAMMPS developers (the link is in my previous post), can you try using it ?

Tested CPUs:
AMD Ryzen Threadripper PRO 3955WX 16-Cores (latest LAMMPS version compiled as mentioned above)
Intel(R) Xeon(R) CPU E5-2670 v3 (LAMMPS (2 Aug 2023 - Update 2) - static executable)
Intel(R) Xeon(R) Gold 6240 (LAMMPS (2 Aug 2023 - Update 2) - static executable)

Tested GPU - NVIDIA GeForce RTX 3060 (latest LAMMPS version compiled as mentioned above)

The fact that this is showing up for some testers (@mkanski @rbberger) and not for others (@ohearnk) suggests that it may be due to initialized data feeding the expression for DlpVi . Perhaps the value of sbp_i->valency is different in the two cases?

The fact that this is showing up for some testers (@mkanski @rbberger) and not for others (@ohearnk) suggests that it may be due to initialized data feeding the expression for DlpVi . Perhaps the value of sbp_i->valency is different in the two cases?

If you are running the exact same code with the exact same input you should get the exact same results unless the code is miscompiled or there should be an uninitialized data access that can be detected with valgrind's memcheck tool.

This is why we provide the static (and thus portable) pre-compiled Linux binaries for LAMMPS that @mkanski was referring to. For the latest stable version that is: https://github.com/lammps/lammps/releases/download/stable_2Aug2023_update3/lammps-linux-x86_64-2Aug2023_update3.tar.gz
and for the lastest feature release version that is:
https://github.com/lammps/lammps/releases/download/patch_7Feb2024/lammps-linux-x86_64-7Feb2024.tar.gz

Using these (serial with OpenMP) binaries, that will run on any Linux x86_64 system, eliminates all variations from personal compilations, different compilers, different compilation settings etc. and thus helps making results more reproducible. I strongly support the suggestion to first test with these binaries and based on the outcome of that determine what is the best strategy to identify the source of the undesired behavior.

After quite a bit of digging, @ohearnk (thank you!) has found out that the discrepancy is due to the force field files used. We were using the force field in Adri's archives which is identical to the one supplied here except for one parameter. Specifically, the two files differ in over/under coordination parameter (p_ovun2), which is the first parameter on the fourth row:

<<< force field from LAMMPS issue

-1.0000 2.9867 1.0338 6.2998 2.5791 0.0000 0.0000 0.0000

Dr. van Duin's force field

-24.7561 2.9867 1.0338 6.2998 2.5791 0.0000 0.0000 0.0000

According to the SI of the referred publication, the value of this parameter should be -24.7561. Not sure how/why it was changed to -1.0000. Quoting Adri "This makes a very big difference, and -1.000 is not an acceptable value for this parameter. When I use -1.00 I indeed see a big energy jump of 4 kcal/mol around a Au-C bond distance of 3.5 Angstrom and I do not get good energy conservation in an NVE simulation."

Yay! That makes sense. I'm guessing someone changed that value and then tried to restore it, but incorrectly used the value 1 line up.

Thanks @ohearnk and @hasanmetin for figuring this out.