SEXCMD workflow relies on sex-specific markers to determine the sex of the analyzed sample. Forked from SEXCMD repo by Seongmun Jeong The workflow will produce a report which provides a prediction of sex of the input sample. Accuracy estimate is provided unless sex is UNDEFINED. SEXCMD will not predict sex if there is not enough information (reads aligned to sex-specific loci).
java -jar cromwell.jar run sexcmd.wdl --inputs inputs.json
Parameter | Value | Description |
---|---|---|
inputFastq |
File | input fastq file |
inputBam |
File | Input alignments in .bam format for the same fastq file |
outputFileNamePrefix |
String | Output file prefix |
sexcmdReport.sequencingType |
Int | 1:Whole Exome Seq, 2:RNA-Seq, 3:Whole Genome Seq |
Parameter | Value | Default | Description |
---|---|---|---|
makeStats.jobMemory |
Int | 6 | Memory (GB) allocated for this job |
makeStats.timeout |
Int | 4 | Number of hours before task timeout |
makeStats.modules |
String | "samtools/1.14" | Modules needed to run SEXCMD |
makeStats.outputPrefix |
String | "STATREPORT" | Custamizable output file name prefix |
sexcmdReport.referenceFasta |
String | "$HG38_SEXCMD_RES_ROOT/sex_marker_filtered.hg38.final.fasta" | Marker reference file, comes from a resource module |
sexcmdReport.outputFilePrefix |
String | "SEXCMD" | Prefix of the report file to be provisioned |
sexcmdReport.sexModelType |
String | "XY" | Sex determination system type : XY or ZW |
sexcmdReport.jobMemory |
Int | 10 | Memory (GB) allocated for this job |
sexcmdReport.timeout |
Int | 4 | Number of hours before task timeout |
sexcmdReport.modules |
String | "hg38-sexcmd-res/1.0 sexcmd/1.0" | Modules needed to run SEXCMD |
estimateAccuracy.minLength |
Int | 80 | Hard threshold for minimum length, if average read length is less we have low accuracy (<60%) |
estimateAccuracy.jobMemory |
Int | 4 | Memory (GB) allocated for this job |
estimateAccuracy.timeout |
Int | 4 | Number of hours before task timeout |
Output | Type | Description |
---|---|---|
reportFile |
File | Report generated by the tool |
This section lists command(s) run by SEXCMD workflow
- Running SEXCMD workflow
The workflow accepts a single fastq file which may be any of R1 or R2 files for paired reads sequencing. In a case when the workflow fails to find enough information to deduce the sex of the donor it will still generate a report which would say that sex is UNDEFINED
set -euo pipefail
samtools stats INPUT_BAM | grep ^SN > OUTPUT.stats
touch PREFIX.OUTPUT
Rscript SEXCMD.R
-m REFERENCE_FASTA
-t SEQUENCING_TYPE
-s SEX_MODEL
-f INPUT_FASTQ
-p OUTPUT_PREFIX
python3 snippet:
outputFile = "PREFIX.OUTPUT"
reportString = "Sex_Determination\tRatio of X and Y counts is NA\tSex of this sample BASENAME_OF_FASTQ is UNDEFINED\n"
numberOfLines = 0
with open(outputFile, "r") as inp:
numberOfLines = len(inp.readlines())
inp.close()
if numberOfLines == 0:
"""Write down the string"""
with open(outputFile, "w") as m:
m.writelines(reportString)
m.close()
touch ~{outputFileNamePrefix}.report
python3 snippet:
coefficients = {"unmapped ratio": {"intercept": 0.9346, "slope": -0.3658},
"insert size standard deviation:": {"intercept": 0.9068, "slope": -0.0001}}
def convert_value(value: str):
try:
return int(value)
except ValueError:
return float(value)
print(f'Unable to convert [{value}], recording as-is')
return value
metrics = {}
with open("~{bamStatFile}", "r") as samstats:
data_lines = [line.strip() for line in samstats if line.startswith('SN')]
for line in data_lines:
fields = line.split("\t")
if len(fields) >= 3:
metrics[fields[1]] = convert_value(fields[2])
if "reads unmapped:" in metrics.keys() and "sequences:" in metrics.keys() and metrics["sequences:"] > 0:
metrics["unmapped ratio"] = round(metrics["reads unmapped:"]/metrics["sequences:"], 3)
accuracy = 1.0
if "average length:" in metrics.keys() and metrics["average length:"] <= ~{minLength}:
accuracy = 0
else:
for c in coefficients.keys():
if c in metrics.keys():
estimated = coefficients[c]["intercept"] + metrics[c]*coefficients[c]["slope"]
if estimated < accuracy:
accuracy = round(estimated, 3)
with open("~{sexcmdReport}", "r") as sexinfo:
infolines = [line for line in sexinfo]
if accuracy > 0:
infolines.append(f'Estimated Accuracy is ~ {accuracy*100}%\n')
else:
infolines.append("Estimated Accuracy is < 60%\n")
with open("~{outputFileNamePrefix}.report", "w") as out:
out.writelines(infolines)
For support, please file an issue on the Github project or send an email to gsi@oicr.on.ca .
Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)