Code that combines machine learning and thermodynamics for the prediction of solubility related properties. To run this code, and to make predictions, you need the ML models. Code including ML models is available as a conda package, or through an online webtool. The models in SolProp_ML are trained with databases from the SolProp data collection.
- Requirements
- Installation
- Supported solvents and solutes
- Definitions of inputs and outputs
- Sample Jupyter Notebook files for solubility and property prediction
- Example of making ML predictions and solubility calculations from a csv input file
- Web Interface
- How to Cite
- License Information
SolProp_ML has been so far tested to work on Mac and Linux OS. It may not work on Windows.
SolProp_ML can either be installed from conda or from source (i.e. directly from this git repo).
Both options require conda, so first install Miniconda from https://conda.io/miniconda.html.
conda install -c fhvermei solprop_ml
pip install git+https://github.com/bp-kelley/descriptastorus
git clone https://github.com/fhvermei/SolProp_ML.git
cd SolProp_ML
(SolProp_ML
is the path to where you cloned the git repository)conda env create -f environment.yml
conda activate env_solprop
pip install git+https://github.com/bp-kelley/descriptastorus
pip install -e .
or add the path to theSolProp_ML
to thePYTHONPATH
in your .bash_profile or .bashrc file- Download the machine learning model files from here.
- Copy the
Gsolv
,Hsolv
, andSaq
folders from theMachine_learning_models
folder of the downloaded file and place them underSolProp_ML/solvation_predictor/trained_models/
SolProp_ML currently supports prediction for only electrically neutral solute compounds containing H, B, C, N, O, S, P, F, Cl, Br, and I and nonionic liquid solvents. Mixture solvents or solutes are not supported. Predictions for any out-of-range solvents and solutes won't be reliable.
The definitions of prediction inputs and outputs are described in the sample jupyter notebook files.
Please refer to the .ipynb
files located under SolProp_ML/sample_files/
and input.csv
files located under each
example folder.
Input and output definitions are also listed under Example of making ML predictions and solubility calculations from a csv input file.
Sample Jupyter Notebook (ipython) files for doing solubility and property predictions are located under SolProp_ML/sample_files/
.
There are two Jupyter notebook files, example_script.ipynb
and example_with_csv_inputs.ipynb
.
Refer to those files to learn how to use the SolProp_ML model.
The solute/solvent smiles or inchis are read from (1) a csv file, or (2) a pandas dataframe. The columns names must be 'solvent', 'solute', 'temperature' 'reference_solubility', and 'reference_solvent'. Only the column 'solute' is required, all other columns are optional depending on the property that you want to predict or calculate.
Options:
path --> path to csv input file
df --> pandas dataframe instead of path to csv file
gsolv --> predict the solvation free energy of a 'solute' in a 'solvent' at 298K in [kcal/mol]
hsolv --> predict the solvation enthalpy of a 'solute' in a 'solvent' at 298K in [kcal/mol]
saq --> predict the aqueous solubility of a 'solute' in water at 298K in [log10(mol/L)]
solute_parameters --> predict the Abraham Solute Parameters of a 'solute'
reduced_number --> use a reduced number of models in the model ensemble for fast predictions with a lower accuracy
validate_smiles --> validate the smiles before predictions and convert InChI to smiles (takes time)
export_csv --> path to csv file if you want to export results
logger --> path to log file
from ... import calculate_solubility.predict_property
predictions = predict_property(path='./input.csv', df=None, gsolv=True/False, hsolv=True/False, saq=True/False, solute_parameters=True/False, reduced_number=True/False, validate_smiles=True/False, export_csv='./results.csv', logger='./log.log')
This predicts logS, which is in log10 of mol solute per L solution unit (log10(mol/L)).
Options:
path --> path to csv input file
df --> pandas dataframe instead of path to csv file
reduced_number --> use a reduced number of models in the model ensemble for fast predictions with a lower accuracy
validate_smiles --> validate the smiles before predictions and convert InChI to smiles (takes time)
calculate_aqueous --> use aqueous solubility to calculate solubility in other solvents, even if reference solubilities are available
export_csv --> path to csv file if you want to export results
export_detailed_csv --> path to csv file if you want to export results
logger --> path to log file
from ... import calculate_solubility
results = calculate_solubility(path='./input.csv', df=None, validate_smiles=True/False, calculate_aqueous=True/False, reduced_number=True/False, export_csv='./results.csv', export_detailed_csv='./detailed_results.csv', logger='./log.log')
A user-friendly online webtool is available on here under the Solid Solubility Prediction tool.
If you use this software for research, please cite the paper (link to be added soon) as follows:
Vermeire, F. H.; Chung, Y.; Green, W. H. Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures. 2021, [submitted]
SolProp_ML is a free, open-source software package distributed under the Creative Commons Attribution 4.0 International.