Smiles/inchi input and PDBQT output
flatstik opened this issue ยท 12 comments
I would love to see additional feature, where I could just assign GENERATE="smiles_or_inchi_string_here"
command along with other parameters, such as PM7 ADD-H OPT GNORM=0.0
so that mopac would generate (and optimize) the molecule on the fly. Maybe this could be piggybacked from openbabel. Also, additional PDBQT-output along with would be nice to have which would retain the partial charges of this. See openbabel/openbabel#2595
Now I am using a script which handles all of this, but I'd like to get rid of the autodocktools and the gasteiger charges completely.
I think the PDBQT-out could be done with something like this (untested):
subroutine pdbqtout(input_fn, idate, version, ncomments, all_comments, nbreaks, breaks, natoms, labels, elemnt, txtatm1, q2, coord, temp_factor, ele_pdbqt)
implicit none
character(len=80), intent(in) :: input_fn
character(len=10), intent(in) :: idate, version
integer, intent(in) :: ncomments, nbreaks, natoms
character(len=80), dimension(ncomments), intent(in) :: all_comments
integer, dimension(nbreaks) :: breaks
character(len=4), dimension(natoms), intent(in) :: labels
integer, dimension(natoms) :: txtatm1
real(kind=8), dimension(natoms) :: q2
real(kind=8), dimension(3, natoms), intent(in) :: coord
real(kind=8), dimension(natoms) :: temp_factor
character(len=4), dimension(natoms), intent(in) :: elemnt
character(len=4) :: ele_pdbqt
integer :: i, i1, k, nline
logical :: ter, ter_ok, branch
integer :: torsdof
nline = 0
ter_ok = .true.
ter = .true.
torsdof = 0
branch = .false.
open (unit=11, file=trim(input_fn) // ".pdbqt", status="replace")
write (11, '(A80)') 'HEADER data-set: ', trim(input_fn)(:i - 5)
write (11, '(A80)') 'REMARK 3 MOPAC, Version: ' // trim(version) // ' Date: ' // trim(date(5:11)) // trim(date(21:)) // trim(idate(11:16))
do i = 1, ncomments
write (11, '(A80)') 'REMARK 3', trim(all_comments(i))
end do
do i = 1, nbreaks
if (breaks(i) /= 0) then
write (11, '(A80)') 'REMARK 3' // ' Charges were multiplied by 0.5 in break' // trim(str(i)) // ' geometry'
end if
end do
write (11, '(A4)') "ROOT"
do i1 = 1, natoms
i = txtatm1(i1)
if (i == 199) then
ele_pdbqt = " H"
else
ele_pdbqt = elemnt(i)
end if
write (11, '(A6,I5,A4,A1,A3,A4,I2,A3,3F8.3,2F8.3,2X,F8.3,A2)') &
"ATOM", i1, " ", trim(ele_pdbqt), " UNL", i1, " ", coord(1,i1), &
coord(2,i1), coord(3,i1), 1.00, temp_factor(i), q2(i), " ", trim(labels(i))
if (i == nline) then
ter_ok = .false.
else if (i == nline + 1) then
ter_ok = .true.
else
ter_ok = .true.
ter = .true.
nline = i
end if
if (ter_ok) then
if (branch) then
write (11, '(A6)') "BRANCH"
branch = .false.
else if (ter) then
write (11, '(A3)') "TER"
ter = .false.
end if
end if
if (labels(i) == " BR") then
branch = .true.
end if
end do
if (branch) then
write (11, '(A6)') "BRANCH"
end if
write (11, '(A8,I3)') "TORSDOF", torsdof
close (unit=11)
end subroutine pdbqtout
It requires the ele_pdbqt array of the elements (could be done by ele_pdb?, provided by the pdbout.F90). I'm very much unsure if this is correct approach all together. I haven't coded fortran at all.
Could it also be done like this:
subroutine pdbqtout(input_fn, idate, version, ncomments, all_comments, nbreaks, breaks, natoms, labels, elemnt, txtatm1, q2, coord, temp_factor, ele_pdbqt, pdb_header, pdb_remarks)
implicit none
character(len=80), intent(in) :: input_fn
character(len=10), intent(in) :: idate, version
integer, intent(in) :: ncomments, nbreaks, natoms
character(len=80), dimension(ncomments), intent(in) :: all_comments
integer, dimension(nbreaks) :: breaks
character(len=4), dimension(natoms), intent(in) :: labels
integer, dimension(natoms) :: txtatm1
real(kind=8), dimension(natoms) :: q2
real(kind=8), dimension(3, natoms), intent(in) :: coord
real(kind=8), dimension(natoms) :: temp_factor
character(len=4), dimension(natoms), intent(in) :: elemnt
character(len=4) :: ele_pdbqt
character(len=80), intent(in) :: pdb_header
character(len=80), dimension(:), intent(in) :: pdb_remarks
integer :: i, i1, i2, k, nline
logical :: ter, ter_ok
integer :: maxtxt
character(len=80) :: line
character(len=1) :: x
nline = 0
ter_ok = .true.
ter = .true.
open (unit=11, file=trim(input_fn) // ".pdbqt", status="replace")
! Write PDB header
write (11, '(A80)') trim(pdb_header)
! Write PDB remarks
do i = 1, size(pdb_remarks)
write (11, '(A80)') trim(pdb_remarks(i))
end do
maxtxt = len(txtatm1(1))
do i = 1, natoms
i1 = txtatm1(i)
i2 = i
nline = nline + 1
if (ter_ok) then
ter = (i == breaks(nbreaks))
end if
if (ter) then
nbreaks = nbreaks + 1
write (11, "(A,I5,A)") "TER", i2, txtatm(i)(18:26) ! Write out the TERMINAL marker
end if
if (labels(i) /= 99 .and. l_atom(i1) .and. elemnt(labels(i))(1:1) /= " ") then
ele_pdbqt(1:1) = elemnt(labels(i))(1:1)
ele_pdbqt(2:2) = Char(Ichar("A") - Ichar("a") + Ichar(elemnt(labels(i))(2:2)))
else
ele_pdbqt(1:1) = " "
ele_pdbqt(2:2) = elemnt(labels(i))(2:2)
end if
x = txtatm(i)(13:13)
if (x == "X") then
txtatm(i)(13:13) = " "
end if
write (11, '(A6,I5,A4,A1,A3,A4,I2,A3,3F8.3,2F8.3,2X,F8.3,A2)') "ATOM", i2, txtatm(i)(12:15), " ", txtatm(i)(17:19), i2, txtatm(i)(21:22), &
& coord(1, i), coord(2, i), coord(3, i), 1.00, temp_factor(i), q2(i), ele_pdbqt
txtatm(i)(13:13) = x
end do
! Write branch information
do i = 1, nbreaks
write (11, "(A8,I5,I5)") "BRANCH", breaks(i), breaks(i) + 1
end do
! Write TORSDOF line
write (11, "(A8,I1)") "TORSDOF", nbreaks
close (unit=11)
end subroutine pdbqtout
About SMILES/InChi strings: I do not want to support more input options to MOPAC, nor do I want to add new dependencies like OpenBabel, which would be the easiest way to implement such a feature as you say. MOPAC is too monolithic as it is, and workflows should be able to use OpenBabel and MOPAC in separate steps if they need flexible file format conversion - atomistic simulation and molecular file format conversion are very different tasks that should be handled by very different software.
Giving the option to output PDBQT is a reasonable request that I can probably accommodate in a timely manner. Specifically, the PDB format does seem to provide space in columns 67 - 76 for non-standard extensions, which is reasonable place to insert partial charges as per the PDBQT format. While I'm slightly philosophically opposed to jamming even more file format options into MOPAC, this is actually just supporting an existing format (PDB) in a more featureful manner. It provides a convenient way to extract partial charge information, which is often why people are using MOPAC in the first place.
Giving the option to output PDBQT is a reasonable request that I can probably accommodate in a timely manner. Specifically, the PDB format does seem to provide space in columns 67 - 76 for non-standard extensions, which is reasonable place to insert partial charges as per the PDBQT format. While I'm slightly philosophically opposed to jamming even more file format options into MOPAC, this is actually just supporting an existing format (PDB) in a more featureful manner. It provides a convenient way to extract partial charge information, which is often why people are using MOPAC in the first place.
I would appreciate this very much. I use MOPAC mainly to optimize the geometries of molecules and proteins and use them for docking simulations. The NEB-function is also something I'd like to see as most of my trials with protein TS calculations have been just a waste of time..
The current plan is to support NEB with MOPAC through two external programs: an interface with MDI (still under development) that has its own NEB driver, and through the existing interface with QCEngine/QCArchive that can run NEB calculations using geomeTRIC. Implementing MOPAC's MDI interface is my responsibility right now, and it has been hovering near the top of my list of priorities for a while, and the QCArchive developers have NEB working for internal testing purposes, but I don't think it will be released until the student working on it writes a paper about it.
@godotalgorithm, have you had any time to work on the PDBQT-output option yet?
The PDB output will now include partial charge information in the necessary columns to act as a PDBQT file if you add the PRTCHAR keyword to a job. As far as I can tell from online documentation, the partial charge information is the only difference between a PDB and PDBQT file. If there is other missing information from a proper PDBQT file that MOPAC can provide, then please reopen this Issue and clarify further.
Works like charm! Thank you very much!
Iirc, mopac output has reported temperature factors or the so called B-factor (on pdb, columns 61 - 66). For some reason, it seems to be currently flat 0 with the PRTCHAR. Other than that - great job.
By default, MOPAC has been using the temperature B-factor columns to feed partial charge information to a GUI (the B-factor is the partial charge x10). This is redundant if the optional columns are being used for partial charge information, so I just zeroed them out in that case. If I can adjust this GUI behavior and if the optional columns don't cause problems with other programs, then I might make this the default PDB output behavior before the next MOPAC version release (i.e. PRTCHAR wouldn't be needed for this behavior).
At least for autodock vina (and other docking programs of the like) the output of geo-optimized small ligands is just dandy with the exception that I have to remove the headers and the last row (END) from the output. The output is rigid, but simple obabel conversion obabel -i pdbqt output.pdb -o pdbqt output.pdbqt
adds the branches correctly.
PyMOL (2.4.0) does not display the PRTCHAR output of proteins correctly. The resulting pdb needs to be converted (obabel -i pdbqt file.pdb -o pdb -O output.pdb
or obabel -i pdbqt file.pdb -o pdbqt -O output.pdbqt
(with or without -r).
Yeah, in addition to adding partial charge information to each ATOM/HETATM, the PDBQT format adds "ROOT" and "BRANCH" delimiters to a PDB file and other extra lines such as the number of torsional degrees of freedom, "TORSDOF". MOPAC presently does not have a direct way to perform this partitioning into torsional degrees of freedom, so the remainder of the work is best left to Open Babel, as you are presently doing.
Yup. And that's perfectly fine. This is something which probably should added to manual....
This is rather for people to find the information how-to from search engine.
I am, by the way, willing to writing the man-pages and compiling/mainaining the deb-package for of mopac if needed..
Yes, this type of documentation - common workflows needed to interface MOPAC with other software - is exactly what I want in a list of use cases once I start updating the new MOPAC documentation website again. The website has been dormant for a while, but I will shift my main attention there after 2 more MOPAC releases - an imminent release associated with the new PM6-ORG model in MOPAC, and then a release after I complete the 2 open PRs for new MOPAC interfacing.
I was just informed yesterday by #172 that MOPAC is available on Debian and many other Linux package managers.