Strelka variant caller in somatic mode
java -jar cromwell.jar run strelkaSomatic.wdl --inputs inputs.json
Parameter | Value | Description |
---|---|---|
tumorBam |
File | Input BAM file with tumor data |
tumorBai |
File | BAM index file for tumor data |
normalBam |
File | Input BAM file with normal data |
normalBai |
File | BAM index file for normal data |
reference |
String | Reference assembly id |
Parameter | Value | Default | Description |
---|---|---|---|
bedFile |
String? | None | BED file designating regions to process |
numChunk |
Int? | None | If BED file given, number of chunks in which to split each chromosome |
outputFileNamePrefix |
String | "strelkaSomatic" | Prefix for output files |
Parameter | Value | Default | Description |
---|---|---|---|
splitIntervals.gatk |
String | "$GATK_ROOT/bin/gatk" | GATK executable path |
splitIntervals.splitIntervalsExtraArgs |
String? | None | Additional arguments for the 'gatk SplitIntervals' command |
splitIntervals.nonRefModules |
String | "gatk/4.1.2.0" | Environment modules other than the genome refence |
splitIntervals.memory |
Int | 32 | Memory allocated for job |
splitIntervals.overhead |
Int | 8 | Memory overhead for running on a node |
splitIntervals.timeout |
Int | 72 | Hours before task timeout |
convertIntervalsToBed.modules |
String | "python/3.7" | Environment modules |
convertIntervalsToBed.memory |
Int | 16 | Memory allocated for job |
convertIntervalsToBed.timeout |
Int | 4 | Hours before task timeout |
configureAndRunParallel.nonRefModules |
String | "python/2.7 samtools/1.9 strelka/2.9.10" | Environment module names other than genome reference |
configureAndRunParallel.jobMemory |
Int | 16 | Memory allocated for job |
configureAndRunParallel.threads |
Int | 4 | Number of threads for processing |
configureAndRunParallel.timeout |
Int | 4 | Hours before task timeout |
snvsVcfGather.modules |
String | "gatk/4.1.2.0" | Environment module names and version to load (space separated) before command execution |
snvsVcfGather.gatk |
String | "$GATK_ROOT/bin/gatk" | GATK to use |
snvsVcfGather.memory |
Int | 16 | Memory allocated for job |
snvsVcfGather.timeout |
Int | 12 | Hours before task timeout |
indelsVcfGather.modules |
String | "gatk/4.1.2.0" | Environment module names and version to load (space separated) before command execution |
indelsVcfGather.gatk |
String | "$GATK_ROOT/bin/gatk" | GATK to use |
indelsVcfGather.memory |
Int | 16 | Memory allocated for job |
indelsVcfGather.timeout |
Int | 12 | Hours before task timeout |
Output | Type | Description | Labels |
---|---|---|---|
snvsVcf |
File | VCF file with SNVs, .gz compressed | vidarr_label: snvsVcf |
indelsVcf |
File | VCF file with indels, .gz compressed | vidarr_label: indelsVcf |
This section lists command(s) run by strelkaSomatic workflow
- Running strelkaSomatic
set -eo pipefail
~{writeBed}
~{indexFeatureFile}
configureStrelkaSomaticWorkflow.py \
--normalBam ~{normalBam} \
--tumorBam ~{tumorBam} \
--referenceFasta ~{refFasta} \
~{regionsBedArg} \
--runDir .
./runWorkflow.py -m local -j ~{threads}
.bed files use 0-based index, we need to do a proper conversion of interval files
python3 <<CODE
import os, re
intervalFiles = re.split(",", "~{sep=',' intervalFiles}")
for intervalFile in intervalFiles:
items = re.split("\.", os.path.basename(intervalFile))
items.pop() # remove .interval_list suffix
bedName = ".".join(items)+".bed"
with open(intervalFile, 'r') as inFile, open(bedName, 'w') as outFile:
for line in inFile:
if not re.match("@", line): # omit the GATK header
fields = re.split("\t", line.strip())
fields[1] = str(int(fields[1]) - 1)
outFile.write("\t".join(fields)+"\n")
CODE
SplitIntervals is used to produce a requested number of files to make the analysis parallel, decreasing demand for resources per chunk and improving speed
set -eo pipefail
mkdir interval-files
ln -s ~{refFai}
ln -s ~{refDict}
~{gatk} --java-options "-Xmx~{memory-8}g" SplitIntervals \
-R ~{refFasta} \
~{intervalsArg} \
~{scatterArg} \
--subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION \
-O interval-files \
~{splitIntervalsExtraArgs}
cp interval-files/*.interval_list .
This task uses GatherVcfs tools from GATK suite which needs the files to be ordered. No overlap between covered features allowed.
set -eo pipefail
~{gatk} GatherVcfs \
-I ~{sep=" -I " vcfs} \
-R ~{refFasta}
-O ~{outputName}
gzip ~{outputName}
For support, please file an issue on the Github project or send an email to gsi@oicr.on.ca .
Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)