amplab/snap

Sorting error

tylerjkennedy opened this issue · 12 comments

Sorry to bother you again with another issue, but I'm having trouble with the sorting function (-so)

If I align the PE reads there is no issue, but if I add in the -so flag like this:

snap-aligner paired index-directory pair1.fastq.gz pair2.fastq.gz -so -o alignment.bam

I get:
Welcome to SNAP version 1.0dev.104.
Loading index from directory... 0s. 100290401 bases, seed size 27
Aligning.
sorting...Read name: . Size of BAM record 36 larger than allocated 4
SNAP exited with exit code 1 from line 550 of file SNAPLib/Bam.cpp

I tried using the sort memory flag and added "-sm 40" after -so, but had the same error. Do you know how I can fix this and get my alignments sorted and indexed?

Thank you,

Tyler

Could you share a small subset of reads from your fastq files for us to reproduce the error ?

If you are unable to share, you can try a few things:
(1) just to confirm: does aligning the reads without the -so option work for you (only "-o alignment.bam") ?
(2) try sorting the alignments in the SAM format (-so -o alignment.sam) ?

  1. yes, aligning without the -so option works fine.
  2. sorting to SAM format gave this error:
    Welcome to SNAP version 1.0dev.104.
    Loading index from directory... 0s. 100290401 bases, seed size 27
    Aligning.
    sorting...Segmentation fault: 11

I'll try and create a subset fastq for you to try now.

I created a subset of the first 200 reads for the 2 PE read files (it won't let me attach them through this chat, should I email them to you?) and ran those with the -so option. This worked fine and the output was a sorted bam file.

Could you send a subset of reads which fails to produce the sorted bam ? You can email them to arunsub@umich.edu or upload them here: https://www.dropbox.com/request/MFYrgaqTy8VW1KsGGpMm.

I tried subsetting the first 1,000,000 reads and they are still able to produce a sorted bam. The only difference between the subset and the parent read files is the size (each of the 2 PE files is ~5gb) and that the subsets aren't gz compressed. I can upload the 10gb of read data to the dropbox folder if you would like?

If you can upload the complete read set to dropbox that would be great. Let me know if you face any issues.

I just uploaded the files. I also tried to run another set of PE data I have which is smaller (~3gb for each file) and received this error:
Welcome to SNAP version 1.0dev.104.
Loading index from directory... 0s. 100290401 bases, seed size 27
Aligning.
sorting...SAMReader: POS field too long.
SNAP exited with exit code 1 from line 799 of file SNAPLib/SAM.cpp

Thanks! The run for the original data finished successfully on a Linux machine. Not sure yet, but it does look like it is an OS X specific issue.

You can upload the smaller data as well and I will take a look after the first issue.

That error with the second files is for a sam output. If I go for a bam output I get the same "Size of BAM record 36 larger than allocated 4" error as for the first set of files. I'll upload the second set in case you'd like to take a look at that error anyway.

Hi Tyler,

Sorry for the delay in getting back. I pushed a fix for the sorting issue to the os-x-sort-fix branch. Could you try it out when you get a chance ? We will merge it to master once you validate it.

git clone -b os-x-sort-fix https://github.com/amplab/snap
make

--Arun

Hi Arun,

Thank you for doing this. I'm a little caught up in other projects at the moment, but I will try to test out this fix in the next week or two.

Best,

Tyler

Hi,

I just ran this with my data and everything went smoothly.

Thank you again for all of your help!

Tyler