/metabohunter

A Python interface to the Metabohunter 1D 1H NMR metabolite identification service

Primary LanguagePythonMIT LicenseMIT

MetaboHunter

MetaboHunter is a web service for automated assignment of 1D raw, bucketed or peak picked NMR spectra. Identification is performed in comparison to two publicly available databases (HMDB, MMCD) of NMR standard measurements. More information about the algorithm is available in the published paper:

Tulpan, D., Leger, S., Belliveau, L., Culf, A., Cuperlovic-Culf, M. (2011). MetaboHunter: semi-automatic identification of 1H-NMR metabolite spectra in complex mixtures. BMC Bioinformatics 2011, 12:400

I find the service useful to give a first-pass identification of metabolites from 1D spectra, which can subsequently be confirmed or combined with identification via other methods. I originally wrote a Python interface as a standalone script, then as a Pathomx plugin, and have now moved the code into a reusable Python module with some extra IPython goodness. The walkthrough below demonstrates using the service with standard settings, passing a numpy array of ppms and peak heights. There is also a demo of a simple IP[y] Notebook widget set that can be used to configure the request.

The module and source code is available via PyPi and Github.

Setup

The module is on PyPi and has no funky dependencies. You should be able to instal the metabohunter from the command line:

pip install metabohunter

To use the module simply import it. The main module object provides two useful things: a request function that performs the request to the MetaboHunter service and a IPyMetaboHunter which provides nice widgets for IPython Notebooks and a synchronized config dictionary that can be passed to requests.

import metabohunter as mh
import numpy as np
import os
os.environ['http_proxy'] = ''

Input format

To make a request to the MetaboHunter service you need to provide two lists (or 1D numpy arrays) of ppm values (the x axis scale on an NMR spectra) and peak heights (y axis). Here we create some dummy data using an 50-element axis of 0-10 in 0.2 increments, together with a 50-element series of peak heights generated randomly.

ppms = np.arange(0,10,0.2)
peaks = np.random.random(50)*10
ppms
array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ,  1.2,  1.4,  1.6,  1.8,  2. ,
        2.2,  2.4,  2.6,  2.8,  3. ,  3.2,  3.4,  3.6,  3.8,  4. ,  4.2,
        4.4,  4.6,  4.8,  5. ,  5.2,  5.4,  5.6,  5.8,  6. ,  6.2,  6.4,
        6.6,  6.8,  7. ,  7.2,  7.4,  7.6,  7.8,  8. ,  8.2,  8.4,  8.6,
        8.8,  9. ,  9.2,  9.4,  9.6,  9.8])
peaks
array([ 8.31680605,  6.04419835,  6.89353176,  6.00962915,  4.41208152,
        3.2333172 ,  1.39946687,  6.4614129 ,  6.20912024,  0.06888817,
        7.42894489,  6.7128017 ,  0.79111548,  8.85208481,  4.9710428 ,
        4.95762437,  9.82106628,  3.3606115 ,  8.71282185,  9.6313281 ,
        5.1396787 ,  6.90228616,  4.12455523,  3.71683751,  1.77995641,
        1.87159547,  5.43813402,  6.26325801,  9.17281811,  2.507874  ,
        0.64188688,  5.03782693,  6.93223808,  8.59120112,  2.95107901,
        9.70824585,  1.30386675,  1.02667654,  2.46923911,  9.02715511,
        2.42110673,  5.2022395 ,  8.79650171,  7.06068795,  9.45386543,
        4.38466017,  0.22570328,  3.25368676,  0.63608104,  6.98335382])

Performing a request

The results are returned back in a list of the same length as the input array. Mapped metabolites are represented by their Human Metabolome Database (HMDB) identifier whereas unmapped peaks are represented by None.

hmdbs = mh.request(ppms,peaks)
hmdbs
[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 'HMDB00766',
 None,
 'HMDB00210',
 'HMDB01919',
 'HMDB01919',
 None,
 None,
 'HMDB00210',
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 'HMDB00763',
 'HMDB00617',
 'HMDB00763',
 'HMDB00259',
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

To throw away the None's and get the ppm values for the mapped metabolites you can do something like:

[(ppm, hmdb) for ppm, hmdb in zip(ppms, hmdbs) if hmdb is not None]
[(2.0, 'HMDB00766'),
 (2.4000000000000004, 'HMDB00210'),
 (2.6000000000000001, 'HMDB01919'),
 (2.8000000000000003, 'HMDB01919'),
 (3.4000000000000004, 'HMDB00210'),
 (6.8000000000000007, 'HMDB00763'),
 (7.0, 'HMDB00617'),
 (7.2000000000000002, 'HMDB00763'),
 (7.4000000000000004, 'HMDB00259')]

IPython Candy

To make the metabohunter module a bit nicer to work with from within IP[y] Notebooks, the module provides a simple class for generating widgets to control settings. The class is initialised with the default settings for the request, however you can pass additional variables (any of the keyword arguments allowed for request).

mhi = mh.IPyMetaboHunter(confidence=0.1, tolerance=0.5)

Once the objet is created you can call .display() to render the widgets in the current cell. Any changes to the variables are stored back into the IPyMetaboHunter class object (here mhi) and available in subsequent calculations.

mhi.display()

metabohunter-widgets.png

mhi.settings
{'confidence': 0.1,
 'database': 'HMDB',
 'frequency': '600',
 'metabotype': 'All',
 'method': 'HighestNumberNeighbourhood',
 'noise': 0.0,
 'ph': 'ph7',
 'solvent': 'water',
 'tolerance': 0.5}

The widgets manager makes the keyword arguments for the request available via a kwargs property. To provide these to the request function as keyword arguments we just need to unfurl it into the function call using **. Try adjusting the parameters above and seeing how they affect the results when re-running the request.

mh.request(ppms,peaks,**mhi.kwargs)
[None,
 None,
 None,
 None,
 None,
 'HMDB00172',
 'HMDB00011',
 'HMDB00518',
 'HMDB00510',
 'HMDB00510',
 'HMDB00518',
 'HMDB00510',
 'HMDB01547',
 'HMDB01547',
 'HMDB00101',
 'HMDB00208',
 'HMDB00192',
 'HMDB00162',
 'HMDB00014',
 'HMDB00122',
 'HMDB01401',
 'HMDB00272',
 'HMDB00902',
 'HMDB00085',
 None,
 None,
 'HMDB00215',
 None,
 'HMDB00393',
 None,
 None,
 None,
 None,
 None,
 'HMDB01392',
 'HMDB00617',
 'HMDB00303',
 'HMDB01406',
 None,
 None,
 'HMDB00232',
 'HMDB00902',
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]