aoterodelaroza/critic2

Missing bonds and an atom bonded to itself

samblau opened this issue · 15 comments

Hi @aoterodelaroza,

I've written an automated workflow to print cube files with Q-Chem and then analyze them with Critic2, and I've run this workflow on over 25,000 optimized structures that are relevant to aqueous electrolyte reactions. I'm just starting to dig into the data and have found two systems that Critic2 isn't handling correctly which I've named neg (the molecule has a -1 total charge) and neut (the molecule is neutral). Here are links to zipped directories containing the XYZ, the cube, the critic output, and a screenshot pointing to the missing bond:

https://www.dropbox.com/s/ajqxwaxdrezxs5m/neg.zip?dl=0
https://www.dropbox.com/s/sikcf0qc92vjspb/neut.zip?dl=0

Let's start with the neut example. While the molecule has 18 bonds, Critic2 only finds 17, missing this near-linear CO bond:

neut_missing

In the neg example, Critic2 misses a different CO bond on the other side of the molecule:

neg_missing

and instead reports that the central carbon in the CO3 group is bonded to itself.

Please let me know if I can provide any further information to help you diagnose and fix these issues. I will also let you know of further problematic cases I find as I dig deeper into the data.

Thanks,
Sam

Thanks, Sam. This is great testing! The missing bonds I'm not surprised to see, because of the known poor performance of critic in locating critical points from fields defined on a grid. I made some progress with the implementation of the grid CP search using discrete Morse theory (the earlier Gyulassy paper without any chemistry in it is actually much clearer) but it'll still take several weeks, as I'm quite swamped with other stuff. As for the atom bonded to itself, that is likely caused by a spurious CP having both its eigenvectors trace gradient paths that end up in the same atom, but I'll take a look.

hey @aoterodelaroza, hope things are going well. I just updated my critic workflow infrastructure for the first time since late 2019 and did a recompile in the process, and I figured I'd check back in about this. Were you ever able to track down the source of these missing bonds and improve the code? Thanks!

I saw Guylassy published a library (BSD-licensed), https://github.com/sci-visus/MSCEER, I'm not sure if any of that would be useful to you for a reference implementation and TopoMS is BSD-licensed too.

I'll try to find some time to attempt this in the near future - it's been too long.

Hi @aoterodelaroza, just wondering if there was an update on this, no worries if not! Just excited to try it out :-) Having some gridded data where the current algorithm seems to reliably fail between pairs of atoms that, while not symmetrically equivalent, clearly should have equivalent bond paths but some are missing, and this seems to be the case even when significantly increasing grid resolution.

There is. I tried to implement the method as described in Gyulassy's papers. Because I'm unfamiliar with the language they use, I first tried to implement a prototype of the method in an octave script where I tried to find the critical points of a 2-dimensional periodic function built as a sum of Gaussians in a 2D square unit cell. My attempt is here and the result of the CP search in graphical form is:
plot.pdf where squares represent maxima, crosses represent bonds, and circles represent minima. As you can see, there are very many spurious critical points.

Now, if I understand the papers correctly, this is the expected result. The problem they are trying to solve is to find the set of critical points that are consistent with a discrete version of the gradient vector field. This set of critical points they find may
contain very many spurious points in the sense that they are not present in the continuous vector field that generated the grid data. However, the argument is that when you go from the analytical function to the grid, information is lost, and the whole CP complex is topologically consistent, that is, the Morse sum (in the 2D case, maxima + minima - bonds) is correct. In the example above, there are 33 minima, 39 maxima, and 72 saddles,and 33 - 72 + 39 = 0. I believe my implementation of this first part of the algorithm is correct, and it would not be a huge task to port it to 3D non-orthogonal grids.

Because the spurious critical points are obviously inconsistent with the continuous density that generated the grid, the authors propose a second part where they cancel pairs of critical points based on their "persistence", i.e., the difference in density between the critical points. CP pairs can be cancelled only if they differ by 1 in dimension (for instance, maxima and saddles and minima and saddles but not minima and maxima). After the cancellation, the discrete vector field is "retouched" by reverting the direction of some of the gradient paths in order to maintain the consistency between the CP complex and the discrete vector field. I tried to implement this in the commented out part of the script (the script shows the latest version - earlier versions were simpler). The problem is that, while the papers describe the first part of the algorithm very well, this persistence-based clean up of the complex is not described in as much detail, and I am missing important steps. First, I tried a clean up in the naive way (sort all the pairs of CPs by persistence then kill from the bottom up to a certain persistence threshold). This kind of works but I end up with "cycles" of critical points that I'm not sure can be eliminated in this way (i.e. you would have to cancel pairs of pairs to get rid of them. If you just cancel one pair, you'd end up with an inconsistent vector field). Then I tried coming up with fancier and increasingly convoluted versions, to no avail.

After that, my courses happened and I went on to teach and to working on other projects. I'm a bit disappointed, for several reasons. First, I couldn't make either MSCEER or TopoMS work and give reasonable results and no further commits have been pushed to the repos (maybe the versions on github are not complete and development happens elsewhere?). In absence of a reference implementation, I'd be a very large time investment trying to figure out how to do the critical point clean up in the second part. Second, even if the implementation worked as the authors intended, I'm not sure if it can work reliably without user intervention. The papers mention that the persistence-based clean up has a user-controlled persistence threshold, and the molecular graph is understood as a function of this threshold. If I'm not mistaken, from the point of view of the algorithm the spurious CPs are not wrong - they are, after all, consistent with the discrete vector field. It's up to you to decide if they are spurious or not. This is somewhat acceptable if you have one system and you have time to get the graph you want, not so much if you intend to use critic2 to process a large number of materials automatically. Third, from the little I read in the TopoMS code, I doubt very much the implementation can handle non-orthogonal grids, which of course is a problem for us (the articles are written in a more general way, so this is not an insurmountable problem). And, lastly, I have a number of niggling points about the code, such as how the nearest neighbor distances are calculated in periodic systems, but these are minor problems.

In summary, I tried my best but without help from the authors it'd be too time consuming to put the puzzle together. If any of you want to take a crack at it, you are welcome to my script above and I suggest starting with the 2D problem given in it. Otherwise, I think my I'll move on to trying to fit an analytical representation to the density grid, same as it is done in density-fitting methods, and then use that to find the critical points. It'll only work for densities, but that's what most people care about anyway.

Also, if this method is implemented, it should have to pass a correctness test to be acceptable. It seems to me the papers' priorities are efficiency > consistency > correctness, and in my opinion it should be the other way around.

Ah yes, this is disappointing indeed, but I'm thoroughly convinced by your explanation -- thank you for sharing this information and for your attempts. I agree that efficiency is the wrong priority; as you say, if you have to make a determination about these spurious points, it's not feasible for large-scale processing, where you need the correctness first and foremost.

Hi all. I've finally found some time to take a closer look into this problem, and wrote up this. The article describes the problems mentioned above, and proposes a solution for the all-electron density, which I think fixes the problems you have encountered. The new interpolation method is accessed with the SMOOTHRHO option to LOAD. Four examples with QE, abinit, VASP, and Gaussian cube files are provided in the manual. If you are still on this, please try the new keyword and let me know how it goes. Note, however, that it works only for all-electron densities.

For the examples you provided (thanks for keeping them in the dropbox all this time), these inputs work:

molecule neg.dens.0.cube
load neg.dens.0.cube smoothrho

auto discard "$1 < 1e-5"
cpreport neg.vmd graph

This gives the same topology:

* Critical point list, final report (non-equivalent cps)
  Topological class (n|b|r|c): 19(19) 18(18) 0(0) 0(0) 
  Poincare-Hopf sum: 1

for neg and neut. The output files:
neut.cro.txt
neg.cro.txt
and the molecular graphs look like this:
neg

The covellite example mentioned in issue #7 is considered explicitly in the example from the manual. All critical points were found and there are no more spurious self-bonding atoms.

Also relevant for issue #29 but I couldn't find the source files for that one. I'm confident that it'll work, though.

If you are still on this, please try the new keyword and let me know how it goes. Note, however, that it works only for all-electron densities.

This is excellent work; thank you for the update and sharing the examples! This work has been on pause for lack of a reliable method, so it will be exciting to revisit it :)

OK! I'll close this as solved, then. Feel free to reopen if you find any problems.