broadinstitute/Drop-seq

Error in PolyATrimmer

Closed this issue · 2 comments

my code to excute is like this

echo "5' trimming"
TrimStartingSequence
INPUT=tmp/${name}_sort.bam
OUTPUT=tmp/${name}_sort_5trimmed.bam
OUTPUT_SUMMARY=tmp/${name}_adapter_trimming_report.txt
SEQUENCE=AAGCAGTGGTATCAACGCAGAGTGAATGGG
MISMATCHES=0
NUM_BASES=5
echo "3' trimming"
PolyATrimmer
INPUT=tmp/${name}_sort_5trimmed.bam
OUTPUT=tmp/${name}_sort_5trimmed_3trimmed.bam
OUTPUT_SUMMARY=tmp/${name}_polyA_trimming_report.txt
MISMATCHES=0
NUM_BASES=6
USE_NEW_TRIMMER=true
echo "trimming done"

And the warning message is

[Wed May 05 10:32:44 CST 2021] org.broadinstitute.dropseqrna.readtrimming.PolyATrimmer done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1012924416
Exception in thread "main" java.lang.NullPointerException
at htsjdk.samtools.util.SequenceUtil.reverseComplement(SequenceUtil.java:879)
at htsjdk.samtools.util.SequenceUtil.reverseComplement(SequenceUtil.java:115)
at org.broadinstitute.dropseqrna.readtrimming.AdapterDescriptor$TagAdapterElement.getSequence(AdapterDescriptor.java:70)
at org.broadinstitute.dropseqrna.readtrimming.AdapterDescriptor.getAdapterSequence(AdapterDescriptor.java:117)
at org.broadinstitute.dropseqrna.readtrimming.PolyAWithAdapterFinder.getPolyAStart(PolyAWithAdapterFinder.java:61)
at org.broadinstitute.dropseqrna.readtrimming.PolyATrimmer.doWork(PolyATrimmer.java:127)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)

I got a 0bit ${name}_sort_5trimmed_3trimmed.bam and 0bit ${name}_polyA_trimming_report.txt in the end

alecw commented

Hi @AweiIvy ,
Because you didn't specify the ADAPTER option on the command line, you're using the default, which is
~XM~XCACGTACTCTGCGTTGCTACCACTG
The beginning of this, i.e. ~XM~XC means "get the value of XM tag on the read, reverse complement it; get the value of XC tag on the read, reverse complement it, and prepend both of those to the rest of the sequence."

Does your input have XM and XC tags on every read? If not, you'll get the error you reported.

Regards, Alec

Hi @AweiIvy ,
Because you didn't specify the ADAPTER option on the command line, you're using the default, which is
~XM~XCACGTACTCTGCGTTGCTACCACTG
The beginning of this, i.e. ~XM~XC means "get the value of XM tag on the read, reverse complement it; get the value of XC tag on the read, reverse complement it, and prepend both of those to the rest of the sequence."

Does your input have XM and XC tags on every read? If not, you'll get the error you reported.

Regards, Alec
Thanks @alecw !
That's exactly the problem. My bam file contains UB and CB tag instead of XM and XC tags. I changed the adaptor sequence as UBCB and there is no error anymore.
Thanks again!