mozack/abra

Weird error ...

Closed this issue · 4 comments

Once I run ABRA, I always get the following error. Would you please look into it? Thanks.
Loading native library from: /scratch/BREIGR0124_VAxg_01_1408120760/libAbra.so
Loading reference map: /site/ne/home/wings/ref_data/reference_genome/hg19/chrUn_included/ucsc.hg19.fasta
Done loading ref map. Elapsed secs: 112
Fri Aug 15 12:41:14 EDT 2014 : Reading Input SAM Header and identifying read length
Fri Aug 15 12:41:14 EDT 2014 : Identifying header and determining read length
Min insert length: 0
Max insert length: 240721460
Fri Aug 15 12:42:47 EDT 2014 : Max read length is: 100
Fri Aug 15 12:42:47 EDT 2014 : Min contig length: 101
Fri Aug 15 12:42:47 EDT 2014 : Read length: 100
Fri Aug 15 12:42:47 EDT 2014 : Loading target regions
Exception in thread "main" java.lang.NumberFormatException: For input string: "+"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:484)
at java.lang.Integer.parseInt(Integer.java:527)
at abra.RegionLoader.load(RegionLoader.java:42)
at abra.ReAligner.getRegions(ReAligner.java:784)
at abra.ReAligner.loadRegions(ReAligner.java:794)
at abra.ReAligner.reAlign(ReAligner.java:122)
at abra.ReAligner.run(ReAligner.java:1282)
at abra.Abra.main(Abra.java:12)

My Command:
java -Xmx16g -jar ${ABRA_JAR}
--in ${sample_id}.all.sorted.dedup.bam
--out ${sample_id}.all.sorted.dedup.realigned.bam
--ref ${reference_genome}
--targets ${!target_bed_file_path}
--threads 4 --mad 20000 --mbq 27
--working ${temp_dir}

ABRA is looking for an optional kmer size in the fourth column of the targets file. Please create a bed file with only the first 3 columns. I'll put handling this more elegantly on the todo list.

Also, the --mbq param is a positional sum of base qualities. Using a value of 27 is likely to generate a lot of noise during assembly. I'd also recommend using a much smaller value for --mad.

Thank you so much for your reply and suggestions.

I thought --mbq is a single base quality score threshold. Do you have any other suggestions regarding parameter setting?

Many thanks,
Joon

Optimal settings will ultimately depend on your data. I've recently changed the defaults for --mad and --mbq to 150 and 60 respectively and those should be a good starting point. That hasn't been released yet though. If you're dealing with much lower coverage (say 15X) and wish to detect lower frequency somatic variation, you might experiment with throttling mbq back down to the 40 range.

..... and if you're dealing with very high depth, you may wish to increase mnf and mbq to prune the assembly graph more aggressively.