mozack/abra2

YO tag format

ionox0 opened this issue · 6 comments

Thank you for your work in developing this tool,

We are wondering about samples which have YO:Z:N/A for the YO tag. Is this expected behavior and if so would you be able to provide any detail on the reason for an N/A value?

Thank you!

There is also the case where the tag is YO:Z:N/A:12345, with one instance of N/A.

Does this mean that the previous contig was N/A and the previous position of the read was 12345?

Hi, the YO tag stores the original alignment information and N/A stands for Not Applicable. This means that the read was originally unaligned. If you're using bwa, when one read in a pair is unmapped it is assigned to the mate's locus. The number after N/A is set to the read's original position (i.e mate locus) and is used downstream for sorting.

Thank you @mozack for the information.

Now that I understand the reason behind the N/A for the YO tag, it seems there may be an issue with the YO tag parsing on this line:

https://github.com/mozack/abra2/blob/master/src/main/java/abra/SortedSAMWriter.java#L107

Here is the stack trace from our error:

INFO	Thu Apr 04 15:26:51 EDT 2019	PROCESS_REGION_MSECS:	2_158655846_158655966	3676	63	11	0
INFO	Thu Apr 04 15:26:55 EDT 2019	PROCESS_REGION_MSECS:	1_11187651_11187891	5148	13	22	0
INFO	Thu Apr 04 15:26:55 EDT 2019	PROCESS_REGION_MSECS:	2_61725997_61726117	113	63	14	0
INFO	Thu Apr 04 15:26:55 EDT 2019	PROCESS_REGION_MSECS:	1_11188001_11188241	167	10	18	0
INFO	Thu Apr 04 15:26:55 EDT 2019	PROCESS_REGION_MSECS:	2_95849218_95849383	115	23	13	0
INFO	Thu Apr 04 15:26:55 EDT 2019	PROCESS_REGION_MSECS:	1_156846157_156846397	311	21	20	0
INFO	Thu Apr 04 15:26:56 EDT 2019	Clock time in Chromosome: 2_125000001_150000000: 35
INFO	Thu Apr 04 15:26:56 EDT 2019	Processing chromosome chunk: 2_175000001_200000000
INFO	Thu Apr 04 15:26:56 EDT 2019	Clock time in Chromosome: 1_100000001_125000000: 57
INFO	Thu Apr 04 15:26:56 EDT 2019	Processing chromosome chunk: 2_200000001_225000000
java.lang.ArrayIndexOutOfBoundsException: 1
	at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:107)
	at abra.ReAligner.remapReads(ReAligner.java:757)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:404)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
java.lang.ArrayIndexOutOfBoundsException: 1
	at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:107)
	at abra.ReAligner.remapReads(ReAligner.java:757)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:404)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
INFO	Thu Apr 04 15:26:56 EDT 2019	PROCESS_REGION_MSECS:	2_178098685_178099045	239	10	21	0

My guess is that there should be a check for N/A before trying to parse this field, similar to how it is handled here:

if (yo.startsWith("N/A")) {

Let me know if I can provide more information.

I should note that we are attempting to run Abra two times, in order to find out whether an additional realignment step might give better results. This is possibly not an intended use case

Two runs with the current version of ABRA2 should be OK, although I doubt the results will improve much.

If you're trying to run ABRA2 against output from an older ABRA version, you may run into trouble as the format has changed in the YO tag.

The exception above appears to be during the sort phase. You could try disabling the sort using option:
--nosort
and sort the BAM file after ABRA2 completes.

However, please keep in mind that I have not explicitly tested running ABRA2 against older versions of ABRA.

Thank you very much for the info, we'll be sticking with ABRA2 to keep the YO tag format consistent.