Exception in thread "main" org.broadinstitute.dropseqrna.TranscriptomeException: Base [13] was requested, but the read isn't long enough [GGAC]
CherryX727 opened this issue · 4 comments
Hi, I'm trying to run TagBamWithReadSequenceExtended step but I get the following error.And the output file size is 0.I don't know how to solve it.
org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended done. Elapsed time: 33.82 minutes.
Runtime.totalMemory()=3821535232
Exception in thread "main" org.broadinstitute.dropseqrna.TranscriptomeException: Base [13] was requested, but the read isn't long enough [GGAC]
at org.broadinstitute.dropseqrna.utils.BaseQualityFilter.scoreBaseQuality(BaseQualityFilter.java:45)
at org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended.processSingleRead(TagBamWithReadSequenceExtended.java:164)
at org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended.doWork(TagBamWithReadSequenceExtended.java:132)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
And the following is the code I used
./TagBamWithReadSequenceExtended -INPUT /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs.bam \
-OUTPUT /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs_tagged_CellMolecular.bam \
-SUMMARY /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs_tagged_Molecular.bam_summary.txt \
-TMP_DIR /home/data/t050205/software/DropSeq_tools/test_data/ \
-BASE_RANGE 13-20 \
-BASE_QUALITY 10 \
-BARCODED_READ 1 \
-DISCARD_READ True \
-TAG_NAME XM \
-NUM_BASES_BELOW_QUALITY 1
Thanks for your answer.
Hi @CherryX727 ,
Every read 1 in your input BAM needs to be at least 20 bases long. It appears that for at least one read pair, read 1 is only 4 bases long: GGAC
I'm guessing there is a problem with the process that produced the input to this program. I think you need to investigate that process. Note that you can use samtools view
to examine the input BAM.
Regards, Alec
Hi @alecw ,
Thank you for your answer.You are right. I use samtools to view the input BAM and there are some reads only 4 bases long.What should I do with these reads?
The BAM I used is the supplementary file of GSM1544798 SpeciesMix_ThousandSTAMPs_50cellspermicroliter.I checked the size of the downloaded file and it matched the data on the website.Is there a problem with the data download process?If not, how can I make sure that the BAM entered is compliant?
Hi @CherryX727 ,
This BAM has already been processed by the Drop-seq tools. The cellular and molecular barcodes have already been extracted from read 1 and applied as tags, and the reads are aligned. It doesn't make sense to run TagBamWithReadSequenceExtended on this BAM.
I looked at the first read in the BAM:
% samtools view GSM1544798_SpeciesMix_ThousandSTAMPs.bam | head -1
NS500217:67:H14GMBGXX:3:22409:13341:7514 0 HUMAN_1 14283 0 42M8S * 0 0 GGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAAAAACCAACAAAAACAAA A<AAAAAAF))AAA<FF<FA)7<7AA..AF7...)FF.<FAAA..)A7.< XC:Z:AGGCAATAGAAC XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:6 NM:i:3 XM:Z:GATGCCTT UQ:i:34 AS:i:35
As you can see, cellular and molecular indices are in XC and XM tags, there is alignment information on the read, and it is no longer a paired read.
Closed as there was no further response.