Meta-proteomics workflow is an end-to-end data processing and analyzing pipeline for studying proteomes i.e studying protein identification and characterization using MS/MS data.
We identify the active organisms/species in a metagenome corresponding to a wet-lab sample obtained from JGI after gene sequencing. Then the researchers at PNNL culture these samples and make it appropriate to study it as a protein sample. This protein sample may have a single protein or a complex mixture of proteins. Later, this sample is passed through a mass spectrometry instrument to obtain a proprietary data format .RAW file. This file contains MS/MS spectrum i.e mass analysis(mass-to-charge (m/z) ratios) for each peptide sequences identified in the sample.
-
Python codebase:
-
Make your input datasets ready- as described here
- Make your input
storage/
folder visible to workflow. You need to provide path in docker-compose.yml Note:- I left
./storage/
already configured assuming you kept inputs in the project directory itself. - Typically, a Study(such as
stegen
) has more than 1 datasets(RAW files-MSMS spectra) and multiple fastas to search against. This is information is must and a sample is provide here
- I left
- Make your input
-
Configure workflow as per need. Typically, we run in following ways:
- Fully-Tryptic with No modifications (recommended for large datasets such as Prosser Soil.)
- Fully-Tryptic with modifications
- partially-tryptic with Modification( such as MetOx).
- partially-tryptic No Modification.
Notes: - User need to tweek configuration file. To reproduce results achieved for FICUS dataset studies(
Hess
,Stegen
,Blanchard
) - we provided parameter files and a pre-configured env file that could be use to run the workflow.
-
Must have installed docker and docker-compose on your system.
-
To run workflow, From project directory:
make build_unified
to start services. Notes: - (to take containers down and remove volumes:docker-compose down -v
)make run_workflow
It will create astorage/results
folder and create all the necessary files.
-
-
WDL support codebase:
- prepare you input.json
make prepare-your-input
Note: User need to generate the input.json file based on the - mapping (dataset(raw) to annotations(.faa & .gff )) - actual files respective file locations. For you help, a script has been provided. - run the WDL:
Need an
- execution engine(tested with cromwell-66) to run WDL
- along with Java runtime(tested with
openjdk 12.0.1
) 1. if docker support 1.make run_wdl
2. if shifter support to run on cori: 1.make run_wdl_on_cori
- prepare you input.json