/microbe_masst

Using MASST or fastMASST, adding metadata onto a tree ontology for microbes

Primary LanguageJupyter NotebookMIT LicenseMIT

DOI

Welcome to microbeMASST

This repository contains code for the different domain-specific MASSTs currently under development in the Dorrestein Lab at UCSD. This includes microbeMASST, plantMASST, foodMASST, and globalMASST (temporary name). Aggregated outputs of the different MASSTs can be generated using metadataMASST.

The code for the different standalone web applications, which allow for the search of one spectrum at a time, can be found in GNPS_MASST

Find the web apps here:

  1. microbeMASST
  2. plantMASST
  3. foodMASST
  4. metadataMASST

Find the publications associated with the different MASSTs here:

  1. microbeMASST - Nature Microbiology
  2. plantMASST - biorxiv
  3. foodMASST - npj Science of Food

Fast Search via microbeMASST enables batch search of multiple spectra against multiple domain-specific MASSTs at once

Running jobs.py allows you to leverage the Fast Search API and execute a batch search of multiple MS/MS spectra against the current indexed data in GNPS/MassIVE (November 2023) and generate multiple outputs for all the listed domain-specific MASSTs simultaneously.

  1. A series of interactive HTML files trees will be generated for each domain-specific MASST ending with _domain.html (e.g., _microbe.html)
  2. A series of JSON files of the tree will be generated (e.g., _microbe.json)
  3. A _matches.tsv file will be generated, containing all the scans found to match your spectrum of interest in the data that have been indexed. This will include also samples that are not part of the listed domain-specific MASSTs.
  4. A _library.tsv file will be generated, containing a list of spectra from the GNPS libraries found to match your spectrum of interest. This enables level 2 annotation according the Metabolomics Standards Initiative.
  5. A _datasets.tsv file will be generated, containing number of samples found to be matching your spectrum per dataset included in the current index.
  6. A series of _count_domain.tsv files will be generated, containing information on matches found for each specific domain MASST.

Execute batch run

  1. Navigate to the jobs.py and add entries to the files list as ("input_directory/input_file", "output_directory/output_prefix)
  2. Check and adjust, based on your research question, the different parameters for the search, such as minimum cosine score, mz tolerance and number of minimum matching peaks.
  3. Run jobs.py

Note:

  1. You can run either a single .mgf file generated via MZmine, from the molecular networking in GNPS workflow, or a list of USIs provided either via a .csv or .tsv file.
  2. Make sure to run jobs.py a couple of times, until no new output is generated by having the option: skip_existing=True. Due to the Fast Search API some of the entries will fail. Nevertheless sequent re-runs should catch all the possible matches.
  3. Please make user to use Python 3.10

How to cite?

Please cite the following paper: microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data