Is that possible to skip the SamtoFastq and map with bam file directly
ysu2015 opened this issue · 8 comments
Hi,
There is a SamtoFastq step after all the tagging and triming step before mapping with STAR. I tried to mapped directly with the $_unaligned_mc_tagged_polyA_filtered.bam instead of convert to Fastq first. The mapping step works fine. But when I go the the next step "sortSam", I got error like below.
"Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header:" I tried using "AddOrReplaceReadGroups" to revised the sam file, not work.
Any idea what could be get wrong?and any solutions? Thank you so much.
Y
Hi @ysu2015
AFAIK, STAR needs FASTQ input. I have no idea what STAR will do if you don't give it FASTQ input, and I don't understand why you would omit that step. I'm surprised STAR didn't fail. If you don't follow the instructions we provide, you're on your own.
Regards, Alec
Hi Alec,
Thank you for you quick response. STAR actually take bam file as input. (FYI, I used STAR 2.7.6a), and I just curious and think if we could skip the SamtoFastq step. And it turn out that STAR works fine. Just the next step did not work through.
Y
OK, I didn't realize that. Does it work if you run SamToFastq? If so, then I think the thing to do is to try it both ways, and see what is different between STAR output from BAM input, and STAR output from FASTQ input.
Good point. Will give a try.
Y
Hi Alex,
Here is what I got. FYI, they are not come from the same dataset.
So, if I run samTofastq and mapped with STAR, I got
NB551268:196:HTML7BGXC:1:11101:18609:6569 0 chr2 240137037 255 1S34M4S * 0 0 CATGTGGTCCATACACACAGTGGAACATTACTCAGTCGTAAAAAEEEEEAEEEEE/EEEEEEEEEEEEEEEEEEEEEE NH:i:1 HI:i:1 AS:i:27 nM:i:3
If I mapped directly with the tagged&Trimed bam file, I got
NB551268:214:HJ773BGXF:1:11101:10000:10876 16 chr10 48435445 255 66M * 0 0 TCAGATGTTTTGTTCATGATACTATCCTTCAGGGTTATGTGCTTATCAATGAAATAACCCCAGAGG EEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA NH:i:1 HI:i:1 AS:i:64 nM:i:0 XC:Z:GGGGGGGGGGGGGGGGGGGGGGGG RG:Z:Split XM:Z:GGGGGGGGGG
It looks like there are some tags been take over to the mapping results, which impact the downstream script.
Then when I look into the .merge.bam, which used as input for "TagReadWithGeneFunction", I found that those tags have been put back again, and added some others. (example like below). This make me think maybe I could use the "bam_direct_mapped_bam" as input for "TagReadWithGeneFunction", if those additional tags are not use for downstream analysis. What do you think? Thank you for discussion.
NB551268:196:HTML7BGXC:1:13108:12554:11558 16 chr1 10145 3 15S51M * 0 0 ATGGTGCTAAACAACAATCCCTAACCCTATCCCTATCCCTAACCCTAACCTAACCCTAACCCTAAC EAEEEEEAEEEEEEEEEEEEEEEEEEEA6EEEEEEEEEEEEEEEEEEEEEEE6EEEEEAEEAAAAA XC:Z:TCCGTCTAAGTGGTCAAGCAGAAT MD:Z:2C11A5A30 PG:Z:STAR RG:Z:Split NH:i:2 NM:i:3 XM:Z:TGCTTGCTGT UQ:i:108 AS:i:44
Oooo, still facing the error:
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header
I guess, I need to reheader the bam file anyway.
Y
Hi @ysu2015 ,
It appears this is a flaw in STAR. Presumably the BAM that is input to STAR has an @rg record in the header, but that is removed from the output. It would be good if you reported this to the STAR developers so they can fix that at some point.
In general, we've found aligners to be bad at creating BAMs that pass the rigorous validation that is the default for Picard tools. That's why years ago we created MergeBamAlignment, which extracts only the alignment info from the aligned BAM, and everything else is obtained from the unmapped BAM.
Absent a fix from STAR, there are a few ways to fix this problem:
-
Presumably the RG tags that are causing the problem are produced by whatever tool you are using to create the original unmapped BAM. Not emitting RG tags would solve the problem. This doesn't appear to be possible with Picard IlluminaBasecallsToSam.
-
You can disable this validation by passing
VALIDATION_STRINGENCY=LENIENT
orVALIDATION_STRINGENCY=SILENT
to SortSam. You might also need to pass this to MergeBamAlignment. I dislike giving this option because disabling validation opens the door to all sorts of problems that are hard for us to support. -
You can convert to FASTQ as we recommend.
Regards, Alec
Hi Alec,
Thank you so much for the clarification. Very helpful.
Bests,
Y