broadinstitute/picard

Samtofastq does not retain all the information

ajeyab22 opened this issue · 5 comments

The samtofastq command, when converting a bam file to a fastq file does not retain the tag information, namely the XN tags and the XT tags present in the bam file. When markIlluminaAdapters is used, most of the XT tags are added, but a few XT tags are still missed out and XN tags are not added at all.

@ajeyab22 The fastq format does not support tags. As to markIlluminaAdapters, the tool should not add XN tags, and only adds XT tags if it finds an adapter in that read. What you're seeing is the expected behavior for these tools.

Is there a command to add XN tags. I don't see any that could do that.

@ajeyab22 X Y and Z tags (like XT and XN) do not have guaranteed definitions, and different tools can use them for different things. https://www.samformat.info/sam-format-alignment-tags

Based on the combination of XT and XN, the tags in your bam probably came from bwa, so rerunning bwa mem should get back the tags you are looking for. https://bio-bwa.sourceforge.net/bwa.shtml (scroll down to where it says "BWA generates the following optional fields. Tags starting with ‘X’ are specific to BWA.")

Note that this means the XT tags produced by markIlluminaAdapters are probably completely different from the XT tags in your original bam, because bwa and picard use XT for very different things.

kockan commented

Closing this issue for now. Feel free to reopen if there are any updates.