/metk

Model Evaluation Toolkit

Primary LanguagePython

metk

Model Evaluation Toolkit

In metk, I've collected a set of routines for evaluating predictive models.
I put a lot of this code together when I was doing the evaluation for the
TDT and D3R
projects, as well as
a book chapter I wrote in 2013.
I'm releasing this project as a way for the community to collaborate
and (hopefully) agree on best practices for model evaluation. Most of the
initial release is oriented toward the evaluation of free energy calculations.
This is just a start and I plan to add a lot more. Currently, there are
routines to calculate
  • Root mean squared (RMS) error
  • Mean absolute error (MAE)
  • Pearson correlation coefficient (with confidence limits)
  • Spearman rank correlation (rho) (still need to add confidence limits)
  • Kendall tau (still need to add confience limits)
  • Maximum possible correlation given a specific experimental error. This is based on on a 2009 paper by Brown, Muchmore and Hajduk
Most of the statistics is done with routines from scikitlearn
and scipy.
The toolkit also includes code to generate a few diagnositc plots that I
find helpful when looking at model performance. Examples of these plots can be found
here
  • A scatter plot of experimental vs predicted ΔG. Lines are drawn at 1 and 2 kcal error
  • A histogram of the error distribution.
  • The two plots above with ΔG converted to a binding affinity (in uM or nM). On the scatter plot, lines are drawn at 5-fold and 10-fold error. I find that I mentally relate to a fold error in binding affinity better than I do to error expressed in kcal/mol. However, if you like looking at error in kcal/mol, use that plot.
Ultimately, the plan is to implement a number of other methods for model
evaluation including those described in papers by Anthony Nicholls.

Usage

This relase of metk contains a rudimentary command-line interface. More options
will be added in time.
Usage: metk.py --in INFILE_NAME --prefix OUTFILE_PREFIX [--units UNIT_NAME] [--example]

--in INFILE_NAME         input file name
--prefix OUTFILE_PREFIX  prefix for output file names
--units UNIT_NAME        units to display (uM (default) or nM)
--example                show example command lines

Installation

The toolkit works under both Python 2.7 and Python 3.6. Installation is relatively painless.

  1. Install the dependencies, you can do this with pip

    pip install numpy pandas matplotlib scipy docopt
    
  2. Get the code from github. You can either download and unpack the zip file
    or just clone the repository.
    git clone https://github.com/PatWalters/metk.git
    
  3. There's one more trick to make the plots work with matplotlib. When pip installed
    matplotlib, it created a directory under your home directory called .matplotlib. Create
    a file in this directory called matplotlibrc and put this line in that file.
backend: TkAgg
  1. At this point you should be all set. The main script is metk.py. The other Python
    files need to either be in the same directory or in your PYTHONPATH. You can
    then run the script with this command.
    python metk.py
    
    If you're running under Linux or OS-X and you hate typing "python" all the time
    (I know I do) you can do
    chmod +x metk.py
    ./metk.py
    

A Few Notes

I use tabs

Please don't hesitate to let me know if you run into problems or have additions or improvements.

Pat Walters - July 2017