The simple scripts shows how to run ninjamap jobs under the Nextflow framework.
nextflow run \
main-local.nf \
--reads1 's3://nextflow-pipelines/nf-ninjamap/data/read1.fastq.gz' \
--reads2 's3://nextflow-pipelines/nf-ninjamap/data/read2.fastq.gz' \
--db HCom2_20221117 \
--db_prefix HCom2 \
--output_path 's3://genomics-workflow-core/Results/Ninjamap/local' \
-profile docker
The test dataset is available in the data directory
data/read1.fastq.gz
data/read2.fastq.gz
https://zenodo.org/record/7872423/files/hCom2_20221117.ninjaIndex.tar.gz
Run a ninjaMap docker container with a sorted BAM file (a sample aligned to the concatenated reference) and an HCom2 binmap file
docker container run \
-v /host/path/to/indata/:/input_data/ \
-v /host/path/to/outdata/:/output_data/ \
fischbachlab/nf-ninjamap \
python /work/scripts/ninjaMap_parallel_5.py \
-bin /input_data/db/HCom2.ninjaIndex.binmap.csv \
-bam /input_data/bam/sample.sortedByCoord.bam \
-outdir /output_data/summary \
-prefix HCom2
sampleName,R1,R2
sample_name,s3://nextflow-pipelines/nf-ninjamap/data/read1.fastq.gz,s3://nextflow-pipelines/nf-ninjamap/data/read2.fastq.gz
aws batch submit-job \
--job-name nf-ninjamap \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command="fischbachlab/nf-ninjamap, \
"--seedfile", "s3://genomics-workflow-core/Results/ninjamap/example_seedfile.csv", \
"--db","SCv2_4_20210212", \
"--db_prefix", "SCv2_4", \
"--db_path", "s3://maf-versioned/ninjamap/Index", \
"--output_path", "s3://genomics-workflow-core/Results/Ninjamap/biohub" "
aws batch submit-job \
--job-name nf-ninjamap \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command="fischbachlab/nf-ninjamap, \
"--seedfile", "s3://genomics-workflow-core/Results/Ninjamap/project/example.seedfile.csv", \
"--db","HCom2_20221117", \
"--db_prefix", "HCom2", \
"--db_path", "s3://maf-versioned/ninjamap/Index", \
"--output_path", "s3://genomics-workflow-core/Results/Ninjamap/HCom2/project" "
https://zenodo.org/record/7872423/files/hCom2_20221117.ninjaIndex.tar.gz
aws batch submit-job \
--job-name nf-ninjamap \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command="fischbachlab/nf-ninjamap, \
"--seedfile", "s3://genomics-workflow-core/Results/Ninjamap/20221018/seedfile2.csv", \
"--db","HCom2_20221117", \
"--db_prefix", "HCom2", \
"--db_path", "https://zenodo.org/record/7872423/files/hCom2_20221117.ninjaIndex.tar.gz", \
"--output_path", "s3://genomics-workflow-core/Results/Ninjamap/HCom2/20221018", \
"--sampleRate", "0.5" "
Example 4: aws batch job parameters can also be configured using the -params-file option. A copy of the params will be automatically saved to a json file (parameters.json) in the run output bucket.
aws batch submit-job \
--job-name nf-ninjamap-MITI \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command="fischbachlab/nf-ninjamap, \
"-params-file", "s3://genomics-workflow-core/Results/Ninjamap/parameters/example_parameters.json" "
A debug option is added to the ninjamap pipeline. The debug and coverage must be set to 1 at the same time. After enabling this debug option, it will output two sets of bam files in two folders ninjaMap/debug/singular and ninjaMap/debug/escrow, respectively. Note that the run time might be doubled if using the debug option.
- Singular set: singular(primary) bam, index bai and region bed files for each strain.
- Escrow set: escrow bam, index bai and region bed files for each strain.
The output files are organized into 4 folders.
The alignment file of all input reads aligned the defined community database in the bam format
The running logs of various scripts
- *.ninjaMap.abundance.csv: this file shows the statistics of the abundance, coverage and depth of each strain in the defined community
- Strain_Name: strain name
- Read_Fraction: the abundance in the defined community in percentage
- Percent_Coverage: the average coverage per strain in percentage
- Coverage_Depth: the average coverage depth
Sample abundance output
Strain_Name Read_Fraction Percent_Coverage Coverage_Depth
Acidaminococcus-fermentans-DSM-20731-MAF-2 0.0 0 0
Acidaminococcus-sp-D21-MAF-2 0.0 0 0
Adlercreutzia-equolifaciens-DSM-19450 0.0 0 0
Akkermansia-muciniphila-ATCC-BAA-835-MAF-2 6.521739130434782 0.0376527 0.0004
Alistipes-finegoldii-DSM-17242 0.0 0 0
Alistipes-ihumii-AP11-MAF-2 0.0 0 0
Alistipes-indistinctus-YIT-12060-DSM-22520-MAF-2 0.0 0 0
Alistipes-onderdonkii-DSM-19147-MAF-2 1.4492753623188406 0.00439328 0.0001
Alistipes-putredinis-DSM-17216-MAF-2 0.0 0 0
Alistipes-senegalensis-JC50-DSM-25460-MAF-2 0.0 0 0
Alistipes-shahii-WAL-8301-DSM-19121-MAF-2 0.0 0 0
Anaerofustis-stercorihominis-DSM-17244 0.0 0 0
Anaerostipes-caccae-DSM-14662-MAF-2 0.0 0 0
Anaerotruncus-colihominis-DSM-17241-MAF-2 0.0 0 0
Bacteroides-caccae-ATCC-43185-MAF-2 6.521739130434782 0.0387897 0.0004
Bacteroides-cellulosilyticus-DSM-14838-MAF-2 22.463768115942027 0.0529593 0.000599636
Bacteroides-coprocola-DSM-17136-MAF-2 2.898550724637681 0.0109683 9.75984e-05
................................
- *.ninjaMap.read_stats.csv: this file shows the statistics of input reads
- File_Name: sample name
- Reads_Aligned: the number of aligned reads
- Reads_wPerfect_Aln: the number of perfectly aligned reads
- Reads_wSingular_Votes: the number of reads voted as singular
- Reads_wEscrowed_Votes: the number of reads voted as escrow
- Discarded_Reads_w_Perfect_Aln: the number of discarded perfectly aligned reads
-
*.ninjaMap.strain_stats.csv: this file shows the various statistics of each strains
-
*.ninjaMap.votes.csv.gz: the statistics of reads voting (singular or escrow)
-
adapter_trimming_stats_per_ref.txt: this file shows the statistics of adapter trimming
-
read_accounting.csv:
- Sample_Name: sample name
- Total_Fragments: the total number of input raw read pairs
- Fragments_After_Trim: the number of read pairs after QC
- Fragments_Aligned: the number of aligned read pairs
The aggregated output files are organized into 6 files.
- *.covDepth.csv: this file shows the average coverage depth per strain by samples
- *.host_contaminants.csv: this file shows the detected host contaminants (Human or Mouse) by samples if the unalignment rate is over 5%
- *.long.csv: this is the long format of three files (*.readFraction.csv, *.covDepth.csv and *.percCoverage.csv)
- *.percCoverage.csv: this file shows the average coverage per strain in percentage by samples
- *.reads_stats.csv: this file shows the reads statistics in read numbers by samples
- *.readFraction.csv: this file shows the abundance in the defined community in percentage by samples