bcgsc/arks

Identical read headers in demo datat

Closed this issue · 2 comments

In the interleaved fq Examples/arks_test-demo/test_reads.fq.gz the read headers of mates have identical names:

zless Examples/arks_test-demo/test_reads.fq.gz | head -n 8
@gi|453232067|ref|NC_003281.10|_1491891_1491677_1_0_0_0_0:0:0_0:0:0_1e04b/1 BX:Z:TATATCCTCTAGCTAG-1
GACTTGTGTGCTTCAACTCAGAGTTTTTGGCATAAAAACTGTAGTTATTCACATTTTCAGTCATAAATTGGCTAAAAATGTCAATAAATACCGTATTTCTGCCAACAATTTGACAATAGAGCACATTT
+
GEFIFEAEEFBIDFDDCCDCB?ECCCBDDC?ECDCDA?ADABCABAFABC@@@@B<BC@@@<@>@=?>??<>BC?BEA@>=??=A?<=A>>>???<>>><>>>A@=;=?><A<>==;=?<<?=A?>>=
@gi|453232067|ref|NC_003281.10|_1491891_1491677_1_0_0_0_0:0:0_0:0:0_1e04b/1 BX:Z:TATATCCTCTAGCTAG-1
CTGAAATTTGGAAAAAATTGGGTTACTGTAGCTTCGAAAGTACGCAAACACCGAATTTTACGGAATTTCGCACAAAATGGCTGAAAAATGGGCTTTTAAATGTGAAAACGGCTGGAGAACTCAAATTTTTTCATTGAAAATCTGTGAAAAT
+
IIHGHGFGFIFGIFEEEIEDDDDDEDBDCEBECDCBBBC@EFBB@CCACB@@AAAB?@BA@F@@A?AC?D?BA@??BABBBA?B>A@@???><?>><@C?@??@>?==@=D?>??=@>=><<@=>>;>@>>=@=><>;>>==<@?===>==

Should these instead be:

@gi|453232067|ref|NC_003281.10|_1491891_1491677_1_0_0_0_0:0:0_0:0:0_1e04b/1 BX:Z:TATATCCTCTAGCTAG-1
@gi|453232067|ref|NC_003281.10|_1491891_1491677_1_0_0_0_0:0:0_0:0:0_1e04b/2 BX:Z:TATATCCTCTAGCTAG-1

i.e. suffixing the second read with /2 instead of /1?

Also, does the -1 on the end of the barcode serve any purpose?

Thanks

Hello,

The "/1" is actually just part of the read name for this dataset - a quirk of a previous version of LRSim (which was used to simulate these chromium reads). I realize that is confusing and against convention, though, so I updated the test reads file to omit this suffix. Now, the interleaved reads still have identical names, but not the "/1".

As for the "-1" at the end of the barcode, this is added by Longranger basic, and can be used to indicate different read groups. ARKS will read in all non-space characters after the "BX:Z:" tag in the read header, so if you have different chromium libraries, changing this number will differentiate barcodes with the same sequence that are actually from different libraries.

Hope that helps and thanks for pointing this out!
Lauren

Closing this issue -- please feel free to re-open if you have any further questions.