openmopac/mopac

LOCATE-TS run

Closed this issue · 3 comments

I have optimized geometries for two protein models. The first contains pre-docked ligand and the second one contains pre-docked product from the ligand the enzyme should catalyze. After trying to execute LOCATE-TS run for the intermolecular ring forming reaction with the resulting arc-files, openmopac doesn't find the files unless they are renamed and referred in CAPITALS, eg "GEO_DAT="./CAPITAL.ARC" GEO_REF="./CAPITAL2.ARC".

When the files are renamed into capitals, I get again the error that the number of atoms differ - which they do not. This bug is sort-of referred also in #39 .. I regretfully cannot send in the data set for this one.

          Number of atoms in "./CAPITAL2.ARC" = 1500

          Number of atoms in "./CAPITAL.ARC" = 1499

... they both contain 1499 atoms. After fiddling around, mainly by adding 0SCF and GEO-OK, I managed to get out this output:

                            Differences in atoms sets

          Atoms in GEO_DAT only                   Atoms in GEO_REF only

   1   HETATM 1499 6H   XXX U  92

          Empirical formulae of data-set and GEO_REF are different

          Empirical formula of system in GEO_REF: C463 H765 N123 O145 At S2  =  1499 atoms
          Empirical formula of system in GEO_DAT: C463 H766 N123 O145 At S2  =  1500 atoms

          Number of atoms common to both systems: 1499

(what is that "At"? There is no Astatine :P)

But as stated before: both datasets have 1499 atoms:

The last atom line in GEO_REF file is:

  H(HETATM 1499 6H   XXX U  92)  148.01443414 +1  155.20772990 +1  201.31153461 +1

and the last atom line in GEO_DAT file is:

  H(HETATM 1499 6H   YYY U  92)  151.32436200 +1  157.43023803 +1  196.38540724 +1

The "At" is caused by erroneous creation of the arc-file:

           Empirical Formula: C463 H766 N123 O145 S2  =  1499 atoms

 START_RES=(1U 92U) CHAINS=(UU) MOZYME EPS=78.4 GNORM=0.0 DUMP=10M OPT LET PDBOUT REORTH PL +
 T=70.00D SCFCRT=1.D-20 MMOK CHARGE=+1 RESTART THREADS=2

 ATOM      1  N   LEU U   1     171.464 147.980 235.450  1.0   0.00      PROT N


     HERBERTS TEST WAS SATISFIED IN BFGS                      
     SCF FIELD WAS ACHIEVED                                   

          HEAT OF FORMATION       =      -8856.44881 KCAL/MOL =  -37055.38184 KJ/MOL
          DIELECTRIC ENERGY       =        -73.03570 EV
          GRADIENT NORM           =          5.90183          =       0.15244 PER ATOM
          DIPOLE                  =        281.81314 DEBYE   
          NO. OF FILLED LEVELS    =       2057
          CHARGE ON SYSTEM        =          1
          MOLECULAR WEIGHT        =      10440.0015
          COSMO AREA              =       4273.24 SQUARE ANGSTROMS
          COSMO VOLUME            =      11700.07 CUBIC ANGSTROMS

          MOLECULAR DIMENSIONS (Angstroms)

            Atom       Atom       Distance
            H  1472    H     9    56.18848
            H  1169    O   746    32.01081
            O   643    O   880    26.64339
          SCF CALCULATIONS        =       1346
          WALL-CLOCK TIME         = 6 DAYS  1 HOUR 39 MINUTES AND 36.494 SECONDS
          COMPUTATION TIME        = 6 DAYS  1 HOUR 32 MINUTES AND 29.469 SECONDS


          FINAL GEOMETRY OBTAINED
 START_RES=(1U 92U) CHAINS=(UU) MOZYME EPS=78.4 GNORM=0.0 DUMP=10M OPT LET PDBOUT REORTH PL +
 T=70.00D SCFCRT=1.D-20 MMOK CHARGE=+1 RESTART THREADS=2

 ATOM      1  N   LEU U   1     171.464 147.980 235.450  1.0   0.00      PROT N 
  N(ATOM      1  N   LEU U   1)  169.77574606 +1  149.37940457 +1  235.32341545 +1
...

When I remove those PDB-type atom lines, I get the following error as obviously there are no PDB-type labels in the arc files..


      Atoms in GEO_DAT only                   Atoms in GEO_REF only

    HETATM 1488 1H   YYY U  92              HETATM 1488 1H   XXX U  92
    HETATM 1489 2H1  YYY U  92              HETATM 1489 2H1  XXX U  92
    HETATM 1490 3H1  YYY U  92              HETATM 1490 3H1  XXX U  92
    HETATM 1491 1H1  YYY U  92              HETATM 1491 1H1  XXX U  92
    HETATM 1492 2H2  YYY U  92              HETATM 1492 2H2  XXX U  92
    HETATM 1493 3H2  YYY U  92              HETATM 1493 1H2  XXX U  92
    HETATM 1494 1H2  YYY U  92              HETATM 1494 2H3  XXX U  92
    HETATM 1495 2H3  YYY U  92              HETATM 1495 4H   XXX U  92
    HETATM 1496 1H3  YYY U  92              HETATM 1496 1H3  XXX U  92
    HETATM 1497 2H4  YYY U  92              HETATM 1497 2H4  XXX U  92
    HETATM 1498 4H   YYY U  92              HETATM 1498 3H2  XXX U  92
    HETATM 1499 5H   YYY U  92              HETATM 1499 5H   XXX U  92
    HETATM 1500 6H   YYY U  92              HETATM 1500 6H   XXX U  92

                  (Atom labels from GEO_REF will be used)

                            Differences in atoms sets

          Atoms in GEO_DAT only                   Atoms in GEO_REF only

   1   HETATM 1475  N   YYY U  92              HETATM 1475  N   XXX U  92
   2   HETATM 1476  C   YYY U  92              HETATM 1476  C   XXX U  92
   3   HETATM 1477  C   YYY U  92              HETATM 1477  C   XXX U  92
   4   HETATM 1478  C   YYY U  92              HETATM 1478  C   XXX U  92
   5   HETATM 1479  C   YYY U  92              HETATM 1479  C   XXX U  92
   6   HETATM 1480  C   YYY U  92              HETATM 1480  C   XXX U  92
   7   HETATM 1481  C   YYY U  92              HETATM 1481  C   XXX U  92
   8   HETATM 1482  C   YYY U  92              HETATM 1482  C   XXX U  92
   9   HETATM 1483  C   YYY U  92              HETATM 1483  C   XXX U  92
  10   HETATM 1484  C   YYY U  92              HETATM 1484  C   XXX U  92
  11   HETATM 1485  C   YYY U  92              HETATM 1485  C   XXX U  92
  12   HETATM 1486  O   YYY U  92              HETATM 1486  O   XXX U  92
  13   HETATM 1487  O   YYY U  92              HETATM 1487  O   XXX U  92

          ERRORS DETECTED IN PDB LABELS DURING DOCKING MUST BE CORRECTED BEFORE LOCATE-TS CAN BE RUN.
Or add "GEO-OK" if one of the two data sets has PDB-type labels and the other does not have PDB information.

 ***********************************************************************************************
 *                                                                                             *
 *     Error and normal termination messages reported in this calculation                      *
 *                                                                                             *
 * ERRORS DETECTED IN PDB LABELS DURING DOCKING MUST BE CORRECTED BEFORE LOCATE-TS CAN BE RUN. *
 * JOB ENDED NORMALLY                                                                          *
 *                                                                                             *
 ***********************************************************************************************

Just for the giggles, I also tried to use SADDLE instead of LOCATE-TS. At least this time I got the error that it is too large to run using COSMO. If I take the EPS out and run with SADDLE GEO_DAT="CAPITAL.PDB" GEO_REF="CAPITAL2.PDB" GEO-OK+ CHARGE=1 HTML (Yes - with pdb-files), then at least it starts the calculations... And it doesn' even need the "GEO-OK" (with or without plus) to run

I have now identified the problem. If the HETATM label is something else than "HET" it doesn't work correctly. After running 0SCF RESIDUES individually with the geo_dat and geo_ref files (without XENO obviously) the labels XXX and YYY got renamed to "HET" and then LOCATE-TS works just dandy with the arc files. But still no restart file and the filenames for geo_ref and geo_dat need to be fully capitalized.

I don't presently have enough information to reproduce this problem, and so I am as-yet unable to fix it. Can you post an artificial (ideally, short) input file that reproduces this problem?

There is logic in MOPAC to preserve the case of filenames that are placed in quotes, but it might not work in all situations. For example, I suspect that it doesn't work if the filenames are on the second line of the input file when the + keyword line extender is used, but it probably should work with the ++ extender. I'm no longer supporting the + or & extenders because they are not as robustly implemented and add no value (they only remain available for backwards compatibility with old input files). However, I will try to fix bugs associated with ++.

As noted in #39, the lack of restart files is likely because the SCF cycle isn't converging in a timely manner because SCFCRT=1.D-20 is requesting an unattainable level of numerical convergence. Restart files are only produced between completed SCF cycles, so the requested DUMP time is only a lower bound on how often a restart file is produced.

The residual problems that you have noted here - the lack of a restart file and the filename capitalization - have already been addressed in other recent Issues and my responses to them. It is unclear that there is any distinct bug remaining here, so I'm closing this issue for now. Feel free to reopen this Issue or post a related Issue with a working example of any remaining bug related to this.