Error in SplitNCigarReads step
slagtermaarten opened this issue · 3 comments
Hi
Thanks for developing and (maintaining?) this pipeline!
I tried to run it but ran into some issues . Do you have any ideas?
ERROR ~ Error executing process > '3_rnaseq_gatk_splitNcigar (S31)'
Caused by:
Process `3_rnaseq_gatk_splitNcigar (S31)` terminated with an error exit status (1)
Command executed:
# SplitNCigarReads and reassign mapping qualities
java -jar /DATA/resources/gatk/GATK-3.7/GenomeAnalysisTK.jar -T SplitNCigarReads -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores
Command exit status:
1
Command output:
(empty)
Command error:
INFO 01:01:07,799 HelpFormatter - --------------------------------------------------------------------------------
INFO 01:01:07,801 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 01:01:07,801 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 01:01:07,802 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 01:01:07,802 HelpFormatter - [Wed Mar 06 01:01:07 CET 2019] Executing on Linux 4.4.0-142-generic amd64
INFO 01:01:07,802 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12
INFO 01:01:07,806 HelpFormatter - Program Args: -T SplitNCigarReads -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores
INFO 01:01:07,813 HelpFormatter - Executing as m.slagter@coley on Linux 4.4.0-142-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12.
INFO 01:01:07,813 HelpFormatter - Date/Time: 2019/03/06 01:01:07
INFO 01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
INFO 01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
INFO 01:01:07,889 GenomeAnalysisEngine - Strictness is SILENT
INFO 01:01:08,231 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 01:01:08,241 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 01:01:08,286 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
INFO 01:01:08,537 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 01:01:08,545 GenomeAnalysisEngine - Done preparing for traversal
INFO 01:01:08,546 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 01:01:08,546 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 01:01:08,547 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
INFO 01:01:08,572 ReadShardBalancer$1 - Loading BAM index data
INFO 01:01:08,574 ReadShardBalancer$1 - Done loading BAM index data
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
##### ERROR ------------------------------------------------------------------------------------------
So I found out how to get rid of this message here.
Adapting the nextflow script, removing the --fix_misencoded_quality_scores
flag seems to do the trick.
I cannot exclude that this 'bug' was introduced by my use of slightly different versions of the required programs. I tried to run the Docker image but couldn't run nextflow in there. I then opted for a local install (without ensuring I had exactly the same versions of the dependencies as you've used) but eventually ran into the issues detailed here.
This is definitely a GATK related issue. The pipeline is provided as template for the implementation of a var-calling data analysis, but it's not meant to be production quality.
Sorry this command for me returns empty .bam file but why ?
java -jar $GATKjar -T SplitNCigarReads -R ./hs37d5.fa -I ./dupmarked.bam -o ./dupmarked_output.bam -U ALLOW_N_CIGAR_READS
Any suggestion please?
Thanks a lot