Stacking Gaussian Processes to Improve pKa Predictions in the SAMPL7 Challenge
Authors
- Robert M. Raddi
- Department of Chemistry, Temple University
- Vincent A. Voelz
- Department of Chemistry, Temple University
Prediction of relative free energies and macroscopic pKas for SAMPL6 and SAMPL7 small molecules using a standard Gaussian process regression as well as a deep Gaussian process regression.
GPR/
: the code to perform standard and deep GPR, processing, analysis, etc.scripts_and_notebooks/
: scripts that call onGPR/
e.g., script to get features, analysis notebooksscripts_and_notebooks/compile_results.ipynb
: master notebook for creating figures and tablesscripts_and_notebooks/database_info.ipynb
: notebook for analyzing the databasescripts_and_notebooks/features.py
: script for computing descriptorsscripts_and_notebooks/standardGPR.py
: script for runningsklearn.GaussianProcessRegressor
scripts_and_notebooks/runme_deepGP.py
: script for running deep Gaussian process regression usingdeepGPy
Structures/
: input smiles strings, input microtransitionsSubmissions/
: SAMPL7 submission (only for the standard GP model)predictions/
: directories separating results & prediction files for SAMPL6 and SAMPL7. Each consisting of relative free energies and macroscopic pKa values for each small molecule.predictions/SAMPL6_deepGP/
: free energies and macro-pKa predictionspredictions/SAMPL6_stdGP/
: free energies and macro-pKa predictionspredictions/SAMPL7_deepGP/
: free energies and macro-pKa predictionspredictions/SAMPL7_stdGP/
: free energies and macro-pKa predictionspredictions/RandomForestSAMPL6/
: free energies and macro-pKa predictionspredictions/RandomForestSAMPL6_without_filter/
: free energies and macro-pKa predictionspredictions/RandomForestSAMPL7/
: free energies and macro-pKa predictionspredictions/RandomForestSAMPL7_without_filter/
: free energies and macro-pKa predictions
pKaDatabase/
: curated database from various sources.pKaDatabase/pKaDatabase.pkl
: pickle file of database (loads with pandas) NOTE: feature calculations already performed and stored insidepd.read_pickle('pKaDatabase.pkl')
pKaDatabase/Sulfonamides.pkl
: pickle file of only sulfonamides database (loads with pandas)
tables/
: LaTeX tables for free energies and macroscopic pKasfigures/
: macroscopic pKa comparison between std GP and deep GP