openmopac/mopac

IRC / DRC Segfault

Closed this issue · 23 comments

Getting this with on the latest github pull, compiled on Linux as always.

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
Segmentation fault

The files to reproduce the problem are in this tarball: files.tar.gz
mopac transu.mop

Also, I am getting following with SADDLE when using GEO_DAT and GEO_REF.

munmap_chunk(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7fe710a83bfa
#1  0x7fe710a82e25
#2  0x7fe7108dad5f
#3  0x7fe7108dace1
#4  0x7fe7108c4536
#5  0x7fe71091d767
#6  0x7fe710924a59
#7  0x7fe710924d2b
#8  0x7fe710cac6e6
#9  0x7fe710cbca67
#10  0x7fe710f87225
#11  0x7fe710f52b67
#12  0x7fe710d7e418
#13  0x55fa9c0620a7
#14  0x7fe7108c5d09
#15  0x55fa9c0620d9
#16  0xffffffffffffffff
Aborted

If both geometries are used in the data-set, the calculation finishes. The files to reproduce this problem is in this tarball:
files2.tar.gz

Also, the problem of using CAPITAL letters as described in #45 persists. If you try to lowercase both the GEO_DAT and GEO_REF files individually or simultaneously, "does not exist" -error is produced...

Except for the persisting filename case issue (thank you for a more minimal example of that problem, BTW), these both seem like memory allocation errors in the reaction coordinate / saddle-point functionality of MOPAC. Prior to making this repo public, there was extensive testing and fixing of memory allocation problems in MOPAC, but that was with respect to the tests available at the time, and the testing coverage of MOPAC was and remains limited (although we have been slowly expanding it).

This type of memory error is usually straightforward to hunt down, and I should be able to resolve it in a timely manner.

As an aside - with your apparent interest in transition states, would you be interested in using the Nudged Elastic Band (NEB) method if it were available in MOPAC? NEB should be accessible to MOPAC later this year through integration with MDI, which will enable LAMMPS to perform NEB calculations on potential energy surfaces defined by MOPAC calculations.

As an aside - with your apparent interest in transition states, would you be interested in using the Nudged Elastic Band (NEB) method if it were available in MOPAC? NEB should be accessible to MOPAC later this year through integration with MDI, which will enable LAMMPS to perform NEB calculations on potential energy surfaces defined by MOPAC calculations.

Thank you! I'm very interested and have several pathways in my mind which suit for NEB method!

All of these bugs were locally reproduced and subsequently fixed by #56.

I needed an example of the filename case bug that you had previously reported because it was being triggered by a very special case (a single-line input file with no extra trailing line breaks).

Thank you!
Now, if you change the keywords on the same set to LOCATE-TS T=1D bar=0.04 EPS=78.4 GEO_REF="IMIN.ARC" GEO_DAT="INT.ARC" GEO-OK" the fixed version generates following error

Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG IEEE_DENORMAL. And yes, I know the LOCATE-TS has been developed and optimized for use with enzymes.

Getting this with on the latest github pull, compiled on Linux as always.

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
Segmentation fault

The files to reproduce the problem are in this tarball: files.tar.gz mopac transu.mop

Still having the same issue with this data-set. It segfaults after 4998th round after the message:

 FOR ONE QUANTUM TO BE USED
 INSTEAD 0.3KCAL/MOLE WILL BE USED TO START THE IRC

 GEOMETRY PRIORITY, STEP = 0.0500 ANGSTROMS

These latest two crashes have been fixed by #57, and your input files appear to be behaving as intended now.

However, I am not presently familiar with the specific transition-state finding algorithms available in MOPAC, so I am not immediately sure if they are truly functioning as intended. I am familiar with general expectations of transition-state finding, and your two geometries,imin.arc and int.arc, are probably too dissimilar for standard algorithms to successfully find a chemically sensible path between them. I could imagine NEB working, but only when seeded by a carefully initialized path of geometries that seed the intended reaction path.

Similarly, the IRC-based methods are based on vibrational modes, and the IRC=1* mode assumes that the initial geometry contains exactly one saddle point (near-zero forces & exactly one negative-frequency vibrational mode), and it will find the reactant and product associated with the local energy minimum on either side of the saddle. Complicated reactions may just not have a simple, clean 1-saddle structure to their potential energy surface.

These latest two crashes have been fixed by #57, and your input files appear to be behaving as intended now.

However, I am not presently familiar with the specific transition-state finding algorithms available in MOPAC, so I am not immediately sure if they are truly functioning as intended. I am familiar with general expectations of transition-state finding, and your two geometries,imin.arc and int.arc, are probably too dissimilar for standard algorithms to successfully find a chemically sensible path between them. I could imagine NEB working, but only when seeded by a carefully initialized path of geometries that seed the intended reaction path.

The reaction is pictet-spengler type of intramolecular ring formation. With the original input IRC=1* RESTART X-PRIORITY=0.05 PM7 EPS=78.4 HTML DRC PDBOUT GNORM=0 BIGCYCLES=5000 I cannot see any animation in HTML - wonder why.

Also, with the other data set using LOCATE-TS T=1D bar=0.04 EPS=78.4 GEO_REF="imin.arc" GEO_DAT="int.arc" GEO-OK
I receive this error:

          ATOMS     2 AND     1 ARE SEPARATED BY 0.0000 ANGSTROMS.

          TO CONTINUE CALCULATION SPECIFY "GEO-OK"

   NOTE THAT THE ATOM NUMBERS CORRESPOND TO THE 'CARTESIAN COORDINATES' ATOM LIST.

          GEOMETRY IN ERROR, FIX FAULT BEFORE CONTINUING. Atoms:     2 and     1
  Loop:   1  Energy minimization, excluding active site, using L-BFGS


     GRADIENT =  0.00000 IS LESS THAN CUTOFF =  6.00000



 ******************************************************************************
 *                                                                            *
 *     Error and normal termination messages reported in this calculation     *
 *                                                                            *
 * ATOMS     2 AND     1 ARE SEPARATED BY 0.0000 ANGSTROMS.                   *
 * GEOMETRY IN ERROR, FIX FAULT BEFORE CONTINUING. Atoms:     2 and     1     *
 * JOB ENDED NORMALLY                                                         *
 *                                                                            *
 ******************************************************************************

So the GEO-OK doesn't work now

LOCATE-TS effectively does the same thing as SADDLE, it just uses MOZYME calculations instead of MOPAC so that it can be applied to very large systems. This requires successful MOZYME calculations of the reactant and product to function correctly. However, do not expect LOCATE-TS to produce a fundamentally different output than SADDLE, it is calculating the same thing, just using a different solver for the electronic structure.

In your example, MOZYME appears to be having problems, as indicated by its charge assignment of +71 in your example (your target charge appears to be +1). MOZYME isn't setting up the calculations correctly, and the optimization process ends up collapsing both molecules in on themselves, which leads to the error your are observing. I am having problems with LOCATE-TS on much simpler examples, so it may take me a while to familiarize myself enough with this feature to root out any problems it might have.

One more note about your example: the geometry files state a charge of +1 that is not being set in any of the saddle-point calculations. If you intend to perform saddle-point calculations on the +1 charge state, you need to add that keyword to the main input file. All of the keywords are stripped from the geometry files.

According to its Wikipedia page, Pictet-Spengler reactions have a sequence of intermediates. In building a reaction path, you'll need to identify every stable intermediate and every saddle point in between neighboring intermediates. You can't expect MOPAC to automatically resolve an entire complicated reaction.

Do you have JSMol set up to work with local files on your web browser? It requires setting special security permissions because browsers prevent Javascript from accessing local files by default for security reasons. That is a prerequisite for successfully viewing the HTML files that MOPAC produces. I am having problems viewing the output for your example with BIGCYCLES=5000 because the reaction path has too many geometries and JSMol is taking an extremely long time to load it. With much smaller values of BIGCYCLES, I am seeing the animation function as intended.

The fix in #62 causes your LOCATE-TS example to behave more sensibly. It still isn't successfully converging to a saddle point, but I suspect that it is because LOCATE-TS requires the two input geometries to be separated by exactly one saddle point. Its behavior outside this regime seems to be more erratic than SADDLE.

While I don't think there are any active bugs remaining from what was described in this Issue, I'll leave it open for a while if you want to discuss related matters of saddle-point calculations in MOPAC.

I am not expecting MOPAC to automatically resolve an entire reaction. But for bug chasing my examples are good. This example is actually a tautomer pair and I would expect MOPAC to find the transition state.
example.tar.gz

LOCATE-TS runs end with problems in EF (calling FORMD and/or UPDHES) subroutine

CALCULATION IS TERMINATED TO AVOID ZERO DIVIDE

         in UPDHES

And with other files related also

 CALCULATION IS TERMINATED TO AVOID ZERO DIVIDE

          in FORMD

The saddle ends with message BOTH SYSTEMS ARE ON THE SAME SIDE OF THE TRANSITION STATE - which really is not the case. This happens every when the direction cosine is approximately -0.9

As a sidenote, my LOCATE-TS runs with LOCATE-TS(C:0,0,0,0,0;SET:2) GEO_DAT="INTERMEDIATE.ARC" GEO_REF="SAL.ARC" CHARGE=1 HTML PDBOUT XYZ + GNORM=0.0 DUMP=10M PL T=70.00D SCFCRT=1.D-20 MMOK RECALC=1 DMAX=0.05 BAR=0.001 EPS=78.4 OPT DDMIN=0.00001 always loop like this (this is different calculation) so I think there really is a problem updating the hessian matrix:

 NUMERICAL PROBLEMS IN BRACKETING LAMDA  -83594.605392770725       -83594605.392770723       -83594.605392854326        1.1026097505480802E-007   31204038664058988.     
  GOING FOR FIXED LEVEL SHIFTED NR STEP...
 ITER.      1 PLS= 0.155E-02           ENERGY  999999.00000 DELTAE   -2.4426798
 ITER.      2 PLS= 0.124E-01           ENERGY  999999.00000 DELTAE    2.1288701
 ITER.      3 PLS= 0.442E-02           ENERGY  999999.00000 DELTAE   -0.0047199
 ITER.      4 PLS= 0.188E-02           ENERGY  999999.00000 DELTAE    0.0021459

          HESSIAN CALCULATED NUMERICALLY

About the JSmol -problem - I cannot see any animation even if the files are on my webserver

About the JSMol problem, I'll need some more info. With BIGCYCLES=5000 in your example, the transu.pdb file that transu.html is visualizing contains 14,000 frames of animation. This is actually working for me, but JSMol takes about 18 minutes to load the file and just sits there doing nothing with a 100% CPU load while doing so. When I reduce the number of animation frames, either with a much smaller value of BIGCYCLES (e.g. 5) or by pruning the transu.pdb file manually, transu.html loads much more quickly.

I also don't know what your baseline experience is here. Does MOPAC's JSMol-driven html output work for you with other features, or is this your first time trying to use MOPAC's html output?

I'll need to investigate all of this a bit more, and familiarize myself with MOPAC's general capabilities for finding transition states and reaction paths before I can say anything definitive about this. My preliminary interpretation of this behavior is that it is more a limitation of the underlying methods and MOPAC's limited identification of failed assumption than a bug.

Both SADDLE and LOCATE-TS are based on the idea of taking two local minima in a potential energy surface (a "reactant" and a "product") and relaxing them in the presence of an increasingly strong pairing constraint that is basically a harmonic spring that is pushing the reactant and product geometries closer together. When the minima-to-saddle-to-2nd-minima potential energy surface looks close to a quartic potential, this method does a reliable job of converging to the saddle point. Outside of this scenario, there are three distinct things that can go wrong and prevent convergence to a saddle point - (1) both geometries can be in the same basin of convergence and converge to the same local equilibrium geometry, (2) there can be another local minimum between the two starting local minima, or (3) the PES near the saddle is highly asymmetric and the spring ends up pulling one of the geometries over the saddle towards the other local minimum rather than converge to the saddle. These simply aren't universal numerical methods, they have been empirically tested to work for simple chemical examples, and they will fail if the PES gets too complicated or non-standard.

In your tautomer pair example, the proton is crossing a ring to get from its original O-H bond to a new C-H bond, and I suspect there is a relatively flat plateau in the PES that causes a substantial deviation from the typical quartic structure. I don't know enough chemistry to know if they are chemically sensible, but it would certainly produce more mathematically well-behaved saddles if you consider a set of intermediates where the proton jumps to a closer carbon atom (and perhaps more than one) before arriving at the intermediate you are presently targeting. I've attached a modification of your example with a revised intermediate for which SADDLE behaves correctly [saddle-alt.zip].

I'm closing this Issue again because I believe that the strict bugs reported here are all fixed now, although the various saddle-point and transition-state methods in MOPAC certainly aren't as user-friendly or numerically robust as they could ideally be.

Again, if you observe related bugs, feel free to reopen this Issue or create a new Issue.

Also, it might be worthwhile to continue related correspondence on the Discussions board and have a more open-ended discussion about the productive usage of MOPAC's saddle-point and transition-state methods.

I can indeed locate the transition state with your attached modification (and if I use NOREOR, I cannot locate the transition state!). But for the IRC, I don't get anything sensible:
please see here

Again, I'm not highly experienced with this aspect of MOPAC, but I've thrown together an example that generates a sensible IRC from the geometry output of my previous transition state example: IRC-example.zip

I've discussed this a bit with Jimmy before, so I had some warnings and suggestions. I was not able to get IRC to work with the output of the SADDLE calculation because the gradient was too large. I thus had to refine the transition state first with the TS keyword in TS0.mop. IRC was able to function with the more refined geometry, although the numerics are not amazing and fail to produce a clean near-quartic potential energy surface.

Also, there are multiple clear signs that the derivatives are much less accurate with COSMO turned on (EPS=...), and this is probably why the extra TS step is needed to begin with. MOPAC does not have purely analytical first derivatives, and the various numerical contributions are not necessarily of equivalent quality. At the moment, I'm not sure if there are any keywords to improve their quality. This is definitely a detail on which I part company with Jimmy Stewart - I don't trust numerical derivatives in complicated, unsupervised settings, and I think that fully analytical derivatives are necessary for making simulation codes highly reliable. However, they are a technical investment, and you can't really just add them to a large, mature codebase - to be implemented effectively, they have to be a part of the core design.

I don't know what you did to force your posted IRC example to run, but IRC is fundamentally based on following a vibrational mode, hopefully the imaginary-frequency mode associated with the transition state. In that example, it looks like MOPAC decided to follow some kind of very floppy vibrational mode.

I understand. Refining the the transition state is acceptable - but all of the work is pretty much trial and error, thus completely against what Jimmy stated

While MOPAC calls upon many concepts in quantum theory and thermodynamics and uses some fairly advanced
mathematics, the user need not be familiar with these specialized topics

[...]

The simplest description of how MOPAC works is that the user creates a data-file which describes a molecular system
and specifies what kind of calculations and output are desired. The user then commands MOPAC to carry out the
calculation using that data-file. Finally, the user extracts the desired output on the system from the output files created
by MOPAC.

Clearly this is not the case here - and I fully agree that fully analytical derivatives should be calculated. If you want to see more wonky examples, they can be found from here

Those statements from Jimmy are not false, but they are a bit overly optimistic. If you weren't experiencing numerical problems and bugs, then those statements would probably be a lot more applicable to your experiences with MOPAC. The manual in its present state does not provide good guidance on the relative reliability of different features, the numerical pitfalls that a user might encounter, and how/if the pitfalls can be circumvented. There are a lot of worked examples on the old MOPAC website, but they aren't very well organized and they don't have the best guiding commentary.

I did begin the migration of MOPAC's website last year to a new format and location, but it is in an extremely incomplete state right now (and will be a massive time-sink to finish). When I resume website development, my first priority will be to finish migrating all of the keyword documentation, and then I will put together what I hope will be a comprehensive list of use cases with examples and clear guidance to set practical expectations. From your experiences, I suspect that the transition-state/reaction-coordinate documentation will be bristling with warnings and caveats.

Ultimately, you are using MOPAC rather than some other electronic structure code, and presumably for some good reasons (good performance, large feature set, at least the pretense of user-friendliness). While MOPAC remains popular, it has persistently lagged behind other popular QM codes in terms of financial and developer support. A lot of MOPAC's technical problems might be better under control if it was better supported, but those resources are flowing elsewhere (whether or not their targets are actually useful to scientists) and their overall magnitude may be insufficient regardless. I've been work in electronic structure for a long time now, and it's quite clear that we don't have especially good software because science collectively is unwilling to pay for it or otherwise incentivize it. MOPAC (and a lot of other QM codes) have been held together mostly by pride, passion, and stubbornness rather than consistent or generous support. Pride, passion, and stubbornness can only get you so far.

You may have a better experience with MOPAC if you adjust your approach a bit. Rather than take the documentation at face value and try to use multiple, infrequently-used features together, you may benefit from scoping out and discussing a workflow in advance with a developer (e.g. me) and constructing minimal examples of workflows functioning correctly as a reference for more complicated applications. I would be happy to assist, either through private correspondence or on the GitHub Discussions board. A caveat of this offer is that I'll be on vacation and without regular internet access for the next 3 weeks.

I would like to take on your offer via private correspondence - let's continue from this after 3 weeks. Have a really nice and refreshing vacation!

I don't have contact information for you, but I believe that you will be notified of this post. If you still want to initiate the private correspondence, then email me at jemoussa@vt.edu.

NEB

Any timetables for implementing it? I'm keen to test it!

The MDI integration that will facilitate NEB calculations (#52) ended up getting delayed because of other priorities that I had recently. However, in the new year, finishing the basic MDI integration is now a high priority.

NEB calculations in MOPAC are also possible right now by using the recently added NEB features in QCArchive. MOPAC is available to QCArchive through its QCEngine subsystem. It doesn't support MOZYME calculations yet, but that wouldn't be hard to do if there was interest in it. QCArchive is server-based software, so you either need to set up a server or get access to someone else's.

The lead developer of QCArchive might be able to offer some assistance in running or accessing a QCArchive server. @bennybp