PyRMD is a Ligand-Based Virtual Screening tool written in Python powered by machine learning. The project is being developed by the Cosconati Lab from University of Campania Luigi Vanvitelli. Supported by the AIRC Fellowship for Italy Clementina Colombatti.
Manuscript under review.
Authors: Dr. Giorgio Amendola and Prof. Sandro Cosconati
First, users should download and install Anaconda.
Once Anaconda has been installed, download the files from this repository and from the terminal (Linux, MacOs) or the Command Prompt (Windows) enter:
create -f pyrmd_environment.yml
Follow the instructions appearing on the terminal until the environment installation is complete.
Adjust the configuration_file.ini
with a text editor according to your preferences. To use PyRMD, activate the pyrmd
conda environment:
conda activate pyrmd
Then, you are ready to run the software:
python PyRMD_v1.01.py configuration_file.ini
If you need a clean configuration file, running PyRMD without any argument, like this:
python PyRMD_v1.01.py
It will automatically generate a default_config.ini
with default settings.
In the tutorials
folder are present two test cases, one for the benchmark mode and another for the screening mode, with all the files and the configurations already set up. Users only need to run PyRMD in the respective folders.
The benchmark test case allows to benchmark PyRMD performance using the target bioactivity data downloaded from ChEMBL for the tyrosine-kinase MET. The benchmark employs a Repeats Stratified K-Fold approach with 5 folds and 3 repetions. MET decoy compounds downloaded from the DUD-E are also included in the folder to be used as an additional test set. These settings are specified in the configuration_benchmark.ini
file that can be easily modified.
To activate the conda environment and run the benchmark, enter:
conda activate pyrmd
python PyRMD_v1.01.py configuration_benchmark.ini
At the end of the calculations, the benchmark_results.csv
file will include the averaged benchmark metrics (TPR, FPR, Precision, F-Score, ROC AUC, PRC AUC, and BEDROC) across all the folds and repetitions. Also, the plots ROC_curve.png
and PRC_curve.png
will be generated.
The screening test case trains PyRMD with the MET ChEMBL bioactivity data (the same used in the benchmark) and proceeds to screen a small sample of randomly extracted compounds from MCULE. These settings are specified in the configuration_screening.ini
file that can be easily modified.
To activate the conda environment and run the screening, enter:
conda activate pyrmd
python PyRMD_v1.01.py configuration_screening.ini
At the end of the calculations, the database_predictions.csv
file will report a summary of the molecules predicted to be active against MET. For each compound, the file will include the molecule SMILES string, the RMD confidence score(the higher the better), the most similar training active compound and its relative similarity, and a flag indicating if it is a potential PAINS. Also, the predicted_actives.smi
SMILES file will be created to be readily used with other cheminformatics/molecular modeling software.
PyRMD implements the Random Matrix Discriminant (RMD) algorithm devised by Lee et al. to identify small molecules endowed with biological activity. Parts of the RMD algorithm code were adapted from the MATLAB version of the RMD and a Python implementation proposed by Laksh Aithani of the Random Matrix Theory.