handling outlier molecules in `match_minima.py`
vtlim opened this issue · 2 comments
On the full benchmark set, while generating plots with match_minima.py
using data read in from pickle file, the memory grows exceedingly high (observed > 60 Gb) and is eventually killed.
I am working on identifying high memory use areas using the memory_profiler
package with
PYTHONPATH=../ python -m memory_profiler ../match_minima.py -i match.in --cutoff 1.0 --plot --readpickle
Issue was traced back to molecules with extremely high disparate energies. For example, in this plot (disregarding the RMSD axis) some GAFF energies are exceedingly high -- 3.5e7 kcal/mol.
This particular case is due to GAFF missing a specific vdW parameter for polar hydrogen atoms leading to overlapping atoms. Additional molecules with this issue are here:
The solution for this might be to check if any of the FF values compared to the reference method is greater than some cutoff, then skip generating plots for this mol. Cutoff would be arbitrarily defined though, say 1000 kcal/mol?
A temporary workaround is commented in https://github.com/MobleyLab/benchmarkff/blob/master/03_analysis/match_minima.py#L847-L852