=============================================================== DP4-AI version 1.0 Copyright (c) 2020 Alexander Howarth, Kristaps Ermanis, Jonathan M. Goodman distributed under MIT license =============================================================== CONTENTS 1) Release Notes 2) Requirements and Setup 3) Usage 4) NMR Description Format 5) Included Utilites 6) Code Organization =============================================================== RELEASE NOTES DP4-AI version 1.0 is the successor the PyDP4. As described in our latest paper (https://doi.org/10.26434/chemrxiv.11763255.v1) DP4-AI represents a significant increase in functionality over PyDP4. DP4-AI integrates NMR-AI, software for automatic processing, assignment and visualisation of raw NMR data. This functionality affords fully automated DP4 analysis of databases of molecules with no user input. DP4-AI also utilises a fully revised version of PyDP4 updated for Python 3 and with major workflow improvements. Users can now also choose to run DP4-AI within a Graphical user interface for increased ease of use when single DP4 calculations. This graphical user interface can also be used to explore the spectra processed and assigned by NMR-AI. DP4-AI can also be utilised to perform assignments of NMR spectra for a single isomer of a molecule. These assignments can be viewed in the graphical output from DP4-AI or investigated interactively using The included GUI application. More details about the software can be found in the publication DP4-AI Straight from Spectrometer to Structure (https://doi.org/10.26434/chemrxiv.11763255.v1) and in the accompanying supporting information. =============================================================== REQUIREMENTS AND SETUP All the python files and one utility to convert to and from TINKER nonstandard xyz file are in the attached archive. They are set up to work from a centralised location. It is not a requirement, but it is probably best to add the location to the PATH variable. The script currently is set up to use MacroModel for molecular mechanics and Gaussian for DFT and it runs Gaussian on ziggy by default. NWChem and TINKER is also supported. This setup has several requirements. 1) One should have MacroModel or TINKER and NWChem or Gaussian. The DP4-AI folder can contain settings.cfg, where the locations for the installed software packages can be given, so that DP4-AI can launch them and use in the workflow. Examples on how to do this are provided in the included settings.example - this should be renamed to settings.cfg and any relevant changes made. 2) In order to run DP4-AI a python 3.6+ environment the following packages are required: numpy scipy Cython matplotlib nmrglue statsmodels lmfit openbabel rdkit pathos Additionally, the graphical user interface requires PyQT5 3) RDKit and OpenBabel are required for the automatic diastereomer generation and other manipulations of sdf files (renumbering, ring corner flipping), as well as molecule graphical plotting. For OpenBabel this requirement includes Python bindings. The following links provide instructions for building OpenBabel with Python bindings: http://openbabel.org/docs/dev/UseTheLibrary/PythonInstall.html http://openbabel.org/docs/dev/Installation/install.html#compile-bindings Alternatively, the OpenBabel package in conda repository has also lately become more reliable, and is much easier to install. Both OpenBabel 2.4.1 and 3.x are supported. 4) Finally, to run calculations on a computational cluster, a passwordless ssh connection should be set up in both directions - desktop -> cluster and cluster -> desktop. In most cases the modification of the relevant functions in Gaussian.py or NWChem.py will be required to fit your situation. 5) All development and testing was done on Linux. However, both the scripts and all the backend software should work equally well on MacOS with minor adjustments. Windows will require more modification, and is currently not supported. =================== USAGE - GUI To call the script from the Graphical User Interface the folder containing the input files must be opened in terminal and the correct python environment activated. the GUI is then simply called by: python PyDP4_GUI.py This will open the main DP4-AI GUI window, the user can then select the required input files and the settings for MM and DFT calculations. The expected structure file format is *.sdf. In addition DP4-AI can also take inchi, smiles and smarts inputs. To use this feature text file should be prepared with the desired input strings on separate lines, marked with the relevant .smiles, .smarts or .inchi file extension and added using the GUI dialogue. Multiple file inputs in any combination of formats can be added this way Once the DP4-AI has finished all calculations the GUI will open a number of new tabs. These tabs display interactive carbon and proton NMR spectra as processed and assigned by NMR-AI as well as interactive plots of the NMR prediction errors and probabilities and conformer energies and populations. =================== USAGE - Terminal To call the script from terminal: 1) With all diastereomer generation: PyDP4.py Candidate CandidateNMR where Candidate is the sdf file containing 3D coordinates of the candidate structure (without the extension). DP4-AI will automatically detect whether an NMR description (see below) or raw NMR data has been provided by the user. If raw NMR data has been provided (currently in the Bruker file format), NMR-AI will automatically process the NMR spectra provided. In this case the NMR data should be organised into a single folder (passed to DP4-AI) containing one or two folders labelled Proton and (or) Carbon. Alternatively: PyDP4.py -w gmns Candidate CandidateNMR The -w switch specifies the PyDP4 workflow, c specifies structure clean up and 3d coordinate generation utilising rdkit, g specifies diastereomer generation, m molecular mechanics conformational searching, o DFT geometry optimisation, n DFT NMR calculation, e M062X single point energy calculations, s calculate DP4 statistics. The default workflow is gnms, optimum workflow is gnomes. The letter a can be given in place of s, in the instance DP4-AI will analyse and assign the provided NMR spectra but not calculate a DP4 probability. This is perticularly useful in cases where only one isomer of a molecule is being passed to DP4-AI. In addition the -s switch can be used to specify the solvent for use in MM and DFT calculations. Supported solvents are listed in the TMSdata file. Other Gaussian/Jaguar/NWChem supported solvents can be used, but only with manually interpreted data. PyDP4.py -s chloroform Candidate CandidateNMR If solvent is not given, no solvent is used. 2) With explicit diastereomer/other candidate structures: PyDP4.py Candidate1 Candidate2 Candidate3 ... CandidateNMR The script does not attempt to generate diastereomers, simply carries out the DP4 on the specified candidate structures. Structures can also be added from InChI, Smiles and Smarts strings using the designated switches. For example: PyDP4.py --InChI Candidates_inchis.inchi ... CandidateNMR where Candidates_inchs.inchi is a text file with all the desired inchi strings on separate lines. Script has several other switches, including switching the molecular mechanics and dft software etc. -m {t,m}, --mm {t,m} Select molecular mechanics program, t for tinker or m for macromodel, default is t -d {j,g,n,z,w}, --dft {j,g,n,z,w} Select DFT program, j for Jaguar, g for Gaussian, n for NWChem, z for Gaussian on ziggy, w for NWChem on ziggy, default is z (jaguar is not yet implemented) --StepCount STEPCOUNT Specify stereocentres for diastereomer generation -s SOLVENT, --solvent SOLVENT Specify solvent to use for dft calculations -q QUEUE, --queue QUEUE Specify queue for job submission on ziggy (default is s1) -t NTAUT, --ntaut NTAUT Specify number of explicit tautomers per diastereomer given in structure files, must be a multiple of structure files -r, --rot5 Manually generate conformers for 5-memebered rings --ra RA Specify ring atoms, for the ring to be rotated, useful for molecules with several 5-membered rings --AssumeDFTDone Assume RMSD pruning, DFT setup and DFT calculations have been run already (saves time when repeating DP4 analysis) -g, --GenOnly Only generate diastereomers and tinker input files, but don't run any calculations (useful for diastereomer generation for calculations ran on computers without OpenBabel) -c STEREOCENTRES, --StereoCentres STEREOCENTRES Specify stereocentres for diastereomer generation -T, --GenTautomers Automatically generate tautomers -o, --DFTOpt Optimize geometries at DFT level before NMR prediction --pd Use python port of DP4 -b BASICATOMS, --BasicAtoms BASICATOMS Generate protonated states on the specified atoms and consider as tautomers More information on those can be obtained by running PyDP4.py -h ====================== NMR DESCRIPTION FORMAT NMRFILE example begins: 59.58(C3),127.88(C11),127.52(C10),115.71(C9),157.42(C8),133.98(C23),118.22(C22),115.79(C21),158.00(C20),167.33(C1),59.40(C2),24.50(C31),36.36(C34),71.05(C37),142.14(C42),127.50(C41),114.64(C40),161.02(C39) 4.81(H5),7.18(H15),6.76(H14),7.22(H28),7.13(H27),3.09(H4),1.73(H32 or H33),1.83(H32 or H33),1.73(H36 or H35),1.73(H36 or H35),4.50(H38),7.32(H47),7.11(H46) H15,H16 H14,H17 H28,H29 H27,H30 H47,H48 H46,H49 C10,C12 C9,C13 C22,C24 C21,C25 C41,C43 C40,C44 OMIT H19,H51 :example ends Sections are seperated by empty lines. 1) The first section is assigned C shifts, can also be (any). 2) Second section is (un)assigned H shifts. 3) This section defines chemically equivalent atoms. Each line is a new set, all atoms in a line are treated as equivalent, their computed shifts averaged. 4) Final section, starting with a keyword OMIT defines atoms to be ignored. Atoms defined in this section do not need a corresponding shift in the NMR description ===================== CODE ORGANIZATION PyDP4.py file that should be called to start the DP4-AI workflow. Interprets the arguments and takes care of the general workflow logic. PyDP4_GUI.py file that should be called to start the DP4-AI graphical user interface InchiGen.py Gets called if diastereomer and/or tautomer and/or protomer generation is used. Called by PyDP4.py. FiveConf.py Gets called if automatic 5-membered cycle corner-flipping is used. Called by PyDP4.py. MacroModel.py Contains all of the MacroModel specific code for input generation, calculation execution and output interpretation. Called by PyDP4.py. Tinker.py Contains all of the Tinker specific code for input generation, calculation execution and output interpretation. Called by PyDP4.py. ConfPrune.pyx Cython file for conformer alignment and RMSD pruning. Called by Gaussian.py and NWChem.py Gaussian.py Contains all of the Gaussian specific code for input generation and calculation execution. Called by PyDP4.py. GaussianZiggy.py Code specific to running Gaussian on Cambridge CMI local cluster. Called by PyDP4.py. GaussianDarwin.py Code specific to running Gaussian on Cambridge CSD3 HPC cluster. Called by PyDP4.py. NWChem.py Contains all of the NWChem specific code for input generation and calculation execution. Called by PyDP4.py. NMRDP4GTF.py Takes care of all the NMR description interpretation, equivalent atom averaging, Boltzmann averaging, tautomer population optimisation (if used) and DP4 input preparation and running either DP4.jar or DP4.py. Called by PyDP4.py nmrPredictNWChem.py Extracts NMR shifts from NWChem output files DP4.py Equivalent and compact port of the original DP4 java implementation to Python of the same DP4 process. Has been updated to include multigaussian models. Proton_processing.py/Carbon_processing.py NMR-AI python script from processing of raw proton/carbon NMR data Proton_assignment.py/Carbon_assignment.py NMR-AI python script for assignment of raw proton/carbon NMR data Proton_plotting.py/Carbon_plotting NMR-AI python script for plotting of raw proton/carbon NMR data processed and assigned by NMR-AI