Script for generating AM1-BCC charges?
ghutchis opened this issue · 9 comments
Is your feature request related to a problem? Please describe.
The AM1-BCC charge scheme is fairly popular in drug discovery, but as far as I can tell, only Amber and OpenEye toolkits can generate them.
Describe the solution you'd like
I'd love to have an example script that can take a molecule in SDF format and use Open Babel or RDKit to calculate AM1-BCC charges using open-source MOPAC.
This is a reasonable request, and the possibility of AM1-BCC in MOPAC has come up before in development discussions. The AM1 atomic charge part of the AM1-BCC model is straightforward to produce with MOPAC, it is mainly the atom and bond typing that is required for the BCC correction that can get tricky. As suggested, this process needs connectivity information, which could be input from something like an SDF file or produced internally by MOPAC using its Lewis structure assignment feature. Even with this connectivity information, there is a non-trivial amount of logic that is needed to make the type assignments, and without an existing open-source implementation, someone will have to re-implement it.
Are you aware of any open-source implementations of comparable/compatible atom and bond typing given molecular connectivity information?
Whether as an external script or integrated into MOPAC, I do see this as a desirable feature. I'd be very willing to accept a feature like that, and I might be willing to take a pass at implementing the typing logic myself in the near future.
Presumably Amber and/or OpenEye can be used to verify an implementation. With so few previous implementations, I'm slightly worried about a deviation between principle and practice - the typing logic in the implementation papers might not 100% match what Amber/OpenEye do now.
Are you aware of any open-source implementations of comparable/compatible atom and bond typing given molecular connectivity information?
Yes. AmberTools is GPL and has the relevant typing and code.
http://ambermd.org/GetAmber.php#ambertools
conda install -c conda-forge ambertools=22
For example:
antechamber -i name.sdf -fi sdf -o lig1.mol2 -fo mol2 -c bcc # typing and BCC charges to mol2
antechamber -i lig1.mol2 -fi mol2 -o lig2.mol2 -fo mol2 -c wc -cf charges.txt # write to charges.txt
I haven't looked through the source, but antechamber
prints out what it's doing with sqm
, etc.
I can certainly take a test set of molecules and generate the relevant types / charges for comparison. Let me know if you have a preferred set or I'll find something we have here.
I may be slightly misinterpreting this request and/or the availability of various components of Amber. The antechamber
program from AmberTools has command-line options to use mopac
in place of the default sqm
option for the AM1 calculation (-df 0
). However, this use case isn't well documented, and it appears to require a missing script file, mopac.sh
, in the same directory as the AmberTools programs. This file runs MOPAC, making sure that the input file is named mopac.in
and the output file is named mopac.out
. For example, I got it to work with the script:
sed 's/ANALYT//g' mopac.in > mopac_temp.mop
mopac mopac_temp.mop
cp mopac_temp.out mopac.out
rm mopac_temp.*
I also had to remove an obsolete, unnecessary keyword, ANALYT
, that antechamber
is trying to use. This appears to run correctly, but I didn't verify that the results are equivalent to when sqm
is used. I expect that you have a more handy and thorough set of tests, and if there are further problems or related requests, feel free to reopen this issue. Otherwise, I'm not sure there is anything for me to do here, since antechamber
appears to be able to meet your request with a small amount of tinkering.
Just for curiosity - is there some reason why to use AM1-BCC instead of PM7 charges?
There are people who prefer AM1-BCC for various reasons. Personally, since it emulates HF, I'd probably prefer some sort of DFT-RESP model or a more advanced electrostatics model, but YMMV.
I'm hoping for an implementation outside of antechamber
but if you don't want to deal with the BCC part, that's fine.
I was just unclear about the request - whether it was to have a way to produce AM1-BCC results using MOPAC AM1 outputs (which antechamber
does allow for) or to remove the AmberTools dependency completely. I can't just copy the atomtype
and bondtype
source from AmberTools into MOPAC unless I move the overall license "further left" from LGPL to GPL, which I am not prepared to do. However, having self-contained open-source references for these features would make them a lot easier to independently re-implement in MOPAC if necessary.
This isn't going to have a high priority since it's more about eliminating an undesirable dependency than providing previously unavailable functionality, but I'll consider it to be under active consideration.
As for reasons to use AM1-BCC charges over raw PM6/PM7 charges, I can see pros/cons to both. AM1-BCC was explicitly fit to reproduce charge data (Hartree-Fock-based RESP charges), whereas the most direct modeling pressure on PM6/PM7 to produce good charges is dipole-moment reference data. On the other hand, the reference data for AM1-BCC isn't necessarily that accurate (Hartree-Fock rather than post-HF data), the BCC corrections don't seem to be that large in practice, and AM1-BCC charges are not continuous functions of atomic coordinates since they depend on discrete connectivity data. There is a recent survey of semiempirical dipole moments where PM6/PM7 are reasonably keeping pace with other models, but again I'm not sure what this says about the quality of the underlying charges.
I don't know if I have a ton of free coding time, since I'm pouring most of it into Avogadro2 right now. But I'll see if we can implement some of the BCC pieces in straight RDKit or some other form.
For anyone trying to use antechamber
with MOPAC, there is another problem with the MOPAC input files that it generates besides the presence of the obsolete ANALYT
keyword. The input file has only one comment line instead of two, so the first atom in the molecule is not read in properly. A mopac.sh
job script that fixes the problem is:
sed 's/ANALYT//g' mopac.in | sed 's/for mopac/for mopac\n/g' > mopac_temp.mop
mopac mopac_temp.mop
cp mopac_temp.out mopac.out
rm mopac_temp.*
I've reported this problem to AmberTools developers, and it will hopefully be fixed in future versions.