amplab/snap

-so option seems to sort in reverse order

haiqu opened this issue · 7 comments

haiqu commented

Version 1.0.0 code, Windows 64-bit.

Today I created a GRCh38 file from Dante Labs .fasta input, and I applied the sort option.
On checking the files at https://qual.iobio.io/ I find that there are two strange issues compared to the hs37d5 files supplied by Dante.

  1. Chromosomes appear in the reverse order, with ChrY at top and Chr1 at bottom.
  2. The first column of the report is full of rubbish.

Not sure whether the second issue is at their end, but the first almost certainly isn't.

WGStoHG38-GRCh38-Qual-iobio

haiqu commented

Additional data:

  1. Attempting to examine the files at https://bam.iobio.io/ failed unexpectedly.
  2. Attempting to use the files in WGSExtract failed unexpectedly.
haiqu commented

I've built 1.0.2 and will test the output in both qual.iobio and WGSExtract. If there are any further issues I'll let you know. Thanks for the tip about -bSpace I seem to have missed that in the docs.

Rob

haiqu commented

Hi Bill,

image

Ah, that's more like it. Testing in WGSExtract was also successful, so this issue is technically resolved. I'd like to change the sorting issue to a feature request though, since having them in the reverse order triggers my CDO.[1]

I have no opinion about the alt contigs, because I don't use them. As Heng Li wrote[2] in 2017:

"Inclusion of ALT contigs. ALT contigs are large variations with very long flanking sequences nearly identical to the primary human assembly. Most read mappers will give mapping quality zero to reads mapped in the flanking sequences. This will reduce the sensitivity of variant calling and many other analyses. You can resolve this issue with an ALT-aware mapper, but no mainstream variant callers or other tools can take the advantage of ALT-aware mapping."

Rob

[1] It's like OCD, but in the correct alphabetical order.
[2] https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

Turns out that spec notwithstanding the Dragen variant caller does require that the @sq lines in the BAM file be in the same order as in the reference FASTA, so I implemented it. It'll be in the next release. So, your OCD can relax the smallest amount. :-)

Fixed in 2.0.