MobleyLab/benchmarkff

handling outlier molecules in `match_minima.py`

vtlim opened this issue · 2 comments

vtlim commented

On the full benchmark set, while generating plots with match_minima.py using data read in from pickle file, the memory grows exceedingly high (observed > 60 Gb) and is eventually killed.

I am working on identifying high memory use areas using the memory_profiler package with
PYTHONPATH=../ python -m memory_profiler ../match_minima.py -i match.in --cutoff 1.0 --plot --readpickle

vtlim commented

Issue was traced back to molecules with extremely high disparate energies. For example, in this plot (disregarding the RMSD axis) some GAFF energies are exceedingly high -- 3.5e7 kcal/mol.
image

This particular case is due to GAFF missing a specific vdW parameter for polar hydrogen atoms leading to overlapping atoms. Additional molecules with this issue are here:
image

The solution for this might be to check if any of the FF values compared to the reference method is greater than some cutoff, then skip generating plots for this mol. Cutoff would be arbitrarily defined though, say 1000 kcal/mol?