/awROC_calculation

[Master Thesis 2017] Scripts for calculating metrics to assess performance of a drug design software.

Primary LanguagePython

awROC_calculation

Module to generate Receiver Operating Characteristic (ROC) curves, calculate ROCE (ROC Enrichments) and AUC (Area Under the Curve) - the standard ones, as well as their average weighted modification (awROC curve, awROCE, awAUC) which includes active compounds' clustering information. These metrics are used in virtual screening tools benchmarking tests to assess the performance of the software.

The module reads the output virtual screening ranking from PharmScreen, Pharmacelera's tool for ligand-based virtual screening (see the webpage of Pharmacelera). The output of the calculation is a CSV file with the enrichments (at 0.5%, 1%, 2% and 5% of false positives fraction retrieved), AUC and a PNG file with the ROC curve.

ROC

ROC curve renders the ability of the tool to distinguish between two populations: true active compounds and decoys - inactive molecules. X and Y values of the ROC curve at the given point are calculated as follows:

equation-awroc

where: X% is the fraction of the decoys retrieved at the chosen position of the virtual screening ranking.

When dividing the Y point value by the X point value one obtains the ROC Enrichment at the given retrieved decoys fraction. AUC is the area under the whole ROC curve.

awROC

The average weighted modification inlcudes information about active compounds' clustering to evaluate the tool's ability to retrieve new scaffolds. The modified equation for awROC curve points and awROC Enrichments is as follows:

equation-awroce

where: wij = 1/Nj and is the weight of the ith structure from the jth cluster. Nj is the number of structures in given cluster. αX%ij is 1 or 0 depending on whether the ith structure of the jth cluster already (respectively) appeared or not in the chosen fraction of the dataset.

Similarly to the standard ROC curve, the awROC Enrichment can be calculated by dividing the Y point value by the X point value of the curve and awAUC is simply the area under the obtained curve.