openmopac/mopac

Smiles/inchi input and PDBQT output

flatstik opened this issue ยท 12 comments

I would love to see additional feature, where I could just assign GENERATE="smiles_or_inchi_string_here" command along with other parameters, such as PM7 ADD-H OPT GNORM=0.0 so that mopac would generate (and optimize) the molecule on the fly. Maybe this could be piggybacked from openbabel. Also, additional PDBQT-output along with would be nice to have which would retain the partial charges of this. See openbabel/openbabel#2595

Now I am using a script which handles all of this, but I'd like to get rid of the autodocktools and the gasteiger charges completely.

I think the PDBQT-out could be done with something like this (untested):

subroutine pdbqtout(input_fn, idate, version, ncomments, all_comments, nbreaks, breaks, natoms, labels, elemnt, txtatm1, q2, coord, temp_factor, ele_pdbqt)
  implicit none
  character(len=80), intent(in) :: input_fn
  character(len=10), intent(in) :: idate, version
  integer, intent(in) :: ncomments, nbreaks, natoms
  character(len=80), dimension(ncomments), intent(in) :: all_comments
  integer, dimension(nbreaks) :: breaks
  character(len=4), dimension(natoms), intent(in) :: labels
  integer, dimension(natoms) :: txtatm1
  real(kind=8), dimension(natoms) :: q2
  real(kind=8), dimension(3, natoms), intent(in) :: coord
  real(kind=8), dimension(natoms) :: temp_factor
  character(len=4), dimension(natoms), intent(in) :: elemnt
  character(len=4) :: ele_pdbqt
  integer :: i, i1, k, nline
  logical :: ter, ter_ok, branch
  integer :: torsdof

  nline = 0
  ter_ok = .true.
  ter = .true.
  torsdof = 0
  branch = .false.

  open (unit=11, file=trim(input_fn) // ".pdbqt", status="replace")
  write (11, '(A80)') 'HEADER  data-set: ', trim(input_fn)(:i - 5)
  write (11, '(A80)') 'REMARK   3  MOPAC, Version: ' // trim(version) // ' Date: ' // trim(date(5:11)) // trim(date(21:)) // trim(idate(11:16))
  do i = 1, ncomments
    write (11, '(A80)') 'REMARK   3', trim(all_comments(i))
  end do 
 do i = 1, nbreaks
    if (breaks(i) /= 0) then
      write (11, '(A80)') 'REMARK   3' // ' Charges were multiplied by 0.5 in break' // trim(str(i)) // ' geometry'
    end if
  end do

  write (11, '(A4)') "ROOT"

  do i1 = 1, natoms
    i = txtatm1(i1)
    if (i == 199) then
      ele_pdbqt = " H"
    else
      ele_pdbqt = elemnt(i)
    end if

    write (11, '(A6,I5,A4,A1,A3,A4,I2,A3,3F8.3,2F8.3,2X,F8.3,A2)') &
      "ATOM", i1, " ", trim(ele_pdbqt), "  UNL", i1, "   ", coord(1,i1), &
      coord(2,i1), coord(3,i1), 1.00, temp_factor(i), q2(i), " ", trim(labels(i))

    if (i == nline) then
      ter_ok = .false.
    else if (i == nline + 1) then
      ter_ok = .true.
    else
      ter_ok = .true.
      ter = .true.
      nline = i
    end if

    if (ter_ok) then
      if (branch) then
        write (11, '(A6)') "BRANCH"
        branch = .false.
      else if (ter) then
        write (11, '(A3)') "TER"
        ter = .false.
      end if
    end if
    
    if (labels(i) == " BR") then
      branch = .true.
    end if
  end do

  if (branch) then
    write (11, '(A6)') "BRANCH"
  end if

  write (11, '(A8,I3)') "TORSDOF", torsdof

  close (unit=11)
end subroutine pdbqtout

It requires the ele_pdbqt array of the elements (could be done by ele_pdb?, provided by the pdbout.F90). I'm very much unsure if this is correct approach all together. I haven't coded fortran at all.

Could it also be done like this:

subroutine pdbqtout(input_fn, idate, version, ncomments, all_comments, nbreaks, breaks, natoms, labels, elemnt, txtatm1, q2, coord, temp_factor, ele_pdbqt, pdb_header, pdb_remarks)
  implicit none
  character(len=80), intent(in) :: input_fn
  character(len=10), intent(in) :: idate, version
  integer, intent(in) :: ncomments, nbreaks, natoms
  character(len=80), dimension(ncomments), intent(in) :: all_comments
  integer, dimension(nbreaks) :: breaks
  character(len=4), dimension(natoms), intent(in) :: labels
  integer, dimension(natoms) :: txtatm1
  real(kind=8), dimension(natoms) :: q2
  real(kind=8), dimension(3, natoms), intent(in) :: coord
  real(kind=8), dimension(natoms) :: temp_factor
  character(len=4), dimension(natoms), intent(in) :: elemnt
  character(len=4) :: ele_pdbqt
  character(len=80), intent(in) :: pdb_header
  character(len=80), dimension(:), intent(in) :: pdb_remarks
  integer :: i, i1, i2, k, nline
  logical :: ter, ter_ok
  integer :: maxtxt
  character(len=80) :: line
  character(len=1) :: x

  nline = 0
  ter_ok = .true.
  ter = .true.

  open (unit=11, file=trim(input_fn) // ".pdbqt", status="replace")

  ! Write PDB header
  write (11, '(A80)') trim(pdb_header)

  ! Write PDB remarks
  do i = 1, size(pdb_remarks)
    write (11, '(A80)') trim(pdb_remarks(i))
  end do

  maxtxt = len(txtatm1(1))

  do i = 1, natoms
    i1 = txtatm1(i)
    i2 = i

    nline = nline + 1

    if (ter_ok) then
      ter = (i == breaks(nbreaks))
    end if

    if (ter) then
      nbreaks = nbreaks + 1
      write (11, "(A,I5,A)") "TER", i2, txtatm(i)(18:26)  ! Write out the TERMINAL marker
    end if

    if (labels(i) /= 99 .and. l_atom(i1) .and. elemnt(labels(i))(1:1) /= " ") then
      ele_pdbqt(1:1) = elemnt(labels(i))(1:1)
      ele_pdbqt(2:2) = Char(Ichar("A") - Ichar("a") + Ichar(elemnt(labels(i))(2:2)))
    else
      ele_pdbqt(1:1) = " "
      ele_pdbqt(2:2) = elemnt(labels(i))(2:2)
    end if

    x = txtatm(i)(13:13)
    if (x == "X") then
      txtatm(i)(13:13) = " "
    end if

    write (11, '(A6,I5,A4,A1,A3,A4,I2,A3,3F8.3,2F8.3,2X,F8.3,A2)') "ATOM", i2, txtatm(i)(12:15), " ", txtatm(i)(17:19), i2, txtatm(i)(21:22), &
    & coord(1, i), coord(2, i), coord(3, i), 1.00, temp_factor(i), q2(i), ele_pdbqt

    txtatm(i)(13:13) = x
  end do

  ! Write branch information
  do i = 1, nbreaks
    write (11, "(A8,I5,I5)") "BRANCH", breaks(i), breaks(i) + 1
  end do

  ! Write TORSDOF line
  write (11, "(A8,I1)") "TORSDOF", nbreaks

  close (unit=11)
end subroutine pdbqtout

About SMILES/InChi strings: I do not want to support more input options to MOPAC, nor do I want to add new dependencies like OpenBabel, which would be the easiest way to implement such a feature as you say. MOPAC is too monolithic as it is, and workflows should be able to use OpenBabel and MOPAC in separate steps if they need flexible file format conversion - atomistic simulation and molecular file format conversion are very different tasks that should be handled by very different software.

Giving the option to output PDBQT is a reasonable request that I can probably accommodate in a timely manner. Specifically, the PDB format does seem to provide space in columns 67 - 76 for non-standard extensions, which is reasonable place to insert partial charges as per the PDBQT format. While I'm slightly philosophically opposed to jamming even more file format options into MOPAC, this is actually just supporting an existing format (PDB) in a more featureful manner. It provides a convenient way to extract partial charge information, which is often why people are using MOPAC in the first place.

Giving the option to output PDBQT is a reasonable request that I can probably accommodate in a timely manner. Specifically, the PDB format does seem to provide space in columns 67 - 76 for non-standard extensions, which is reasonable place to insert partial charges as per the PDBQT format. While I'm slightly philosophically opposed to jamming even more file format options into MOPAC, this is actually just supporting an existing format (PDB) in a more featureful manner. It provides a convenient way to extract partial charge information, which is often why people are using MOPAC in the first place.

I would appreciate this very much. I use MOPAC mainly to optimize the geometries of molecules and proteins and use them for docking simulations. The NEB-function is also something I'd like to see as most of my trials with protein TS calculations have been just a waste of time..

The current plan is to support NEB with MOPAC through two external programs: an interface with MDI (still under development) that has its own NEB driver, and through the existing interface with QCEngine/QCArchive that can run NEB calculations using geomeTRIC. Implementing MOPAC's MDI interface is my responsibility right now, and it has been hovering near the top of my list of priorities for a while, and the QCArchive developers have NEB working for internal testing purposes, but I don't think it will be released until the student working on it writes a paper about it.

@godotalgorithm, have you had any time to work on the PDBQT-output option yet?

The PDB output will now include partial charge information in the necessary columns to act as a PDBQT file if you add the PRTCHAR keyword to a job. As far as I can tell from online documentation, the partial charge information is the only difference between a PDB and PDBQT file. If there is other missing information from a proper PDBQT file that MOPAC can provide, then please reopen this Issue and clarify further.

Works like charm! Thank you very much!
Iirc, mopac output has reported temperature factors or the so called B-factor (on pdb, columns 61 - 66). For some reason, it seems to be currently flat 0 with the PRTCHAR. Other than that - great job.

By default, MOPAC has been using the temperature B-factor columns to feed partial charge information to a GUI (the B-factor is the partial charge x10). This is redundant if the optional columns are being used for partial charge information, so I just zeroed them out in that case. If I can adjust this GUI behavior and if the optional columns don't cause problems with other programs, then I might make this the default PDB output behavior before the next MOPAC version release (i.e. PRTCHAR wouldn't be needed for this behavior).

At least for autodock vina (and other docking programs of the like) the output of geo-optimized small ligands is just dandy with the exception that I have to remove the headers and the last row (END) from the output. The output is rigid, but simple obabel conversion obabel -i pdbqt output.pdb -o pdbqt output.pdbqt adds the branches correctly.

PyMOL (2.4.0) does not display the PRTCHAR output of proteins correctly. The resulting pdb needs to be converted (obabel -i pdbqt file.pdb -o pdb -O output.pdb or obabel -i pdbqt file.pdb -o pdbqt -O output.pdbqt (with or without -r).

Yeah, in addition to adding partial charge information to each ATOM/HETATM, the PDBQT format adds "ROOT" and "BRANCH" delimiters to a PDB file and other extra lines such as the number of torsional degrees of freedom, "TORSDOF". MOPAC presently does not have a direct way to perform this partitioning into torsional degrees of freedom, so the remainder of the work is best left to Open Babel, as you are presently doing.

Yup. And that's perfectly fine. This is something which probably should added to manual....
This is rather for people to find the information how-to from search engine.

I am, by the way, willing to writing the man-pages and compiling/mainaining the deb-package for of mopac if needed..

Yes, this type of documentation - common workflows needed to interface MOPAC with other software - is exactly what I want in a list of use cases once I start updating the new MOPAC documentation website again. The website has been dormant for a while, but I will shift my main attention there after 2 more MOPAC releases - an imminent release associated with the new PM6-ORG model in MOPAC, and then a release after I complete the 2 open PRs for new MOPAC interfacing.

I was just informed yesterday by #172 that MOPAC is available on Debian and many other Linux package managers.