/metaPro

Workflow for meta-proteomics analysis

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Meta-proteomics workflow.


workflow


Battelle Memorial Institute license DOI GitHub issues GitHub closed issues Lines of code


About

Meta-proteomics workflow is an end-to-end data processing and analyzing pipeline for studying proteomes i.e studying protein identification and characterization using MS/MS data.

We identify the active organisms/species in a metagenome corresponding to a wet-lab sample obtained from JGI after gene sequencing. Then the researchers at PNNL culture these samples and make it appropriate to study it as a protein sample. This protein sample may have a single protein or a complex mixture of proteins. Later, this sample is passed through a mass spectrometry instrument to obtain a proprietary data format .RAW file. This file contains MS/MS spectrum i.e mass analysis(mass-to-charge (m/z) ratios) for each peptide sequences identified in the sample.

How to run the workflow:

  • Python codebase:

    1. Make your input datasets ready- as described here

      • Make your input storage/ folder visible to workflow. You need to provide path in docker-compose.yml Note:
        • I left ./storage/ already configured assuming you kept inputs in the project directory itself.
        • Typically, a Study(such as stegen) has more than 1 datasets(RAW files-MSMS spectra) and multiple fastas to search against. This is information is must and a sample is provide here
    2. Configure workflow as per need. Typically, we run in following ways:

      1. Fully-Tryptic with No modifications (recommended for large datasets such as Prosser Soil.)
      2. Fully-Tryptic with modifications
      3. partially-tryptic with Modification( such as MetOx).
      4. partially-tryptic No Modification. Notes: - User need to tweek configuration file. To reproduce results achieved for FICUS dataset studies(Hess, Stegen, Blanchard) - we provided parameter files and a pre-configured env file that could be use to run the workflow.
    3. Must have installed docker and docker-compose on your system.

    4. To run workflow, From project directory:

      1. make build_unified to start services. Notes: - (to take containers down and remove volumes: docker-compose down -v)
      2. make run_workflow It will create a storage/results folder and create all the necessary files.

  • WDL support codebase:

    1. prepare you input.json make prepare-your-input Note: User need to generate the input.json file based on the - mapping (dataset(raw) to annotations(.faa & .gff )) - actual files respective file locations. For you help, a script has been provided.
    2. run the WDL: Need an
      • execution engine(tested with cromwell-66) to run WDL
      • along with Java runtime(tested with openjdk 12.0.1) 1. if docker support 1. make run_wdl 2. if shifter support to run on cori: 1. make run_wdl_on_cori

More about workflow...

Documentation