ipstone/modules

cluster_samples fail when running mouse WES samples

ipstone opened this issue · 3 comments

it fails at

Started at Sat Jan 2 13:12:09 2021
Terminated at Sat Jan 2 13:12:15 2021
Results reported at Sat Jan 2 13:12:15 2021

Exited with exit code 1.

Resource usage summary:

CPU time :                                   10.76 sec.
Max Memory :                                 1 GB
Average Memory :                             0.60 GB
Total Requested Memory :                     16.00 GB
Delta Memory :                               15.00 GB
Max Swap :                                   -    Max Processes :                              4    Max Threads :                                62
Run time :                                   6 sec.    Turnaround time :                            6 sec.

The output (if any) follows:

INFO 13:12:12,445 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:12:12,447 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-gcfc45fd, Compiled 2014/03/31 11:48:54
INFO 13:12:12,447 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 13:12:12,447 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 13:12:12,450 HelpFormatter - Program Args: -S LENIENT -T UnifiedGenotyper -nt 4 -R /home/ipstone/share/reference/Mus_musculus_GRCm38/Mus_musculus.GRCm38.71.dna.chromosome.genome.fa --dbsnp /home/ipstone/share/reference/mgp.v5.merged.snps_all.dbSNP142.vcf.gz -I bam/study-sample.bam -L /home/ipstone/share/reference/dbsnp_tseq_intersect.bed -o snp_vcf/study-sample.snps.vcf --output_mode EMIT_ALL_SITES
INFO 13:12:12,454 HelpFormatter - Executing as ipstone@lt13 on Linux 3.10.0-957.12.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18.
INFO 13:12:12,454 HelpFormatter - Date/Time: 2021/01/02 13:12:12
INFO 13:12:12,454 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:12:12,455 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:12:13,015 GenomeAnalysisEngine - Strictness is LENIENT
INFO 13:12:13,091 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO 13:12:13,098 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 13:12:13,145 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.05
INFO 13:12:14,961 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.1-1-gcfc45fd):
ERROR##### ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.

It seems the cluster_samples target is using the wrong dbSNPs in its code:

These are probably human snps:

ifeq ($(EXOME),true)
DBSNP_SUBSET ?= $(HOME)/share/reference/dbsnp_137_exome.bed
else
DBSNP_SUBSET = $(HOME)/share/reference/dbsnp_tseq_intersect.bed
endif

Just modified the clusterSamples.mk
to

14 ifeq ($(EXOME),true)
~ 15 #DBSNP_SUBSET ?= $(HOME)/share/reference/dbsnp_137_exome.bed

  • 16 DBSNP_SUBSET ?= $(HOME)/share/reference/mus_musculus_known_genes_exons_GRCm38_noheader.bed
  • 17 # -- modified subset to the mouse exome region

Will test this out once current run is done.

this modification make the cluster_samples work properly.