Mappgene is a genomic sequence analysis workflow designed for high-performance computing. It currently wraps V-pipe (https://github.com/cbg-ethz/V-pipe) with a collection of useful scripts for deployment in almost any Linux environment.
pip3 install --user parsl
git clone https://github.com/LLNL/mappgene.git
Download the Singularity image (https://drive.google.com/file/d/1qOhgiQChtfaMk0VNYpkZvVIYNiGs9ItI/view?usp=sharing) and place it at container/image.sif
.
-
Copy your reference genome in
fasta
format tovpipe_files/references/
. -
Inspect and configure
vpipe_files/vpipe.config
for your run (e.g.,reference =
,trim_percent_cutoff =
,threads =
) -
Organize your input files (typically gzip compressed
fastq
formatted paired-end reads).mappgene.py
expects the following layout, where sample names are taken from the subdirectories (i.e., foo, bar, baz):/path/to/input_dirs |-- foo | |-- foo_R1.fastq.gz | `-- foo_R2.fastq.gz |-- bar | |-- bar_R1.fastq.gz | `-- bar_R2.fastq.gz |-- baz | |... ...
Specify arguments
Example:
python3 mappgene.py \
--input_dirs /path/to/input_dirs \
--output_dirs /path/to/output_dirs \
--read_length 130 \
--nnodes 16 \
--walltime 12:59:59
OR
python3 mappgene.py <config_json>
More info
python3 mappgene.py --help
- V-pipe and other software have been pre-installed in a container (https://github.com/hpcng/singularity).
- User parameters are parsed from command-line or configuration JSON (e.g.
configs/example/catalyst.json
). Note that every parameter has a default value (seemappgene.py
). - Software tasks are then distributed to compute nodes in parallel (https://github.com/Parsl/parsl).
Mappgene is distributed under the terms of the BSD-3 License.
LLNL-CODE-821512