Support for subsampling alignment to uniform coverage
Closed this issue · 5 comments
Hey,
Great tool.
Are there any plans to support Bam files (which would then ideally output a downsampled bam file)?
At the moment, if I want to do this, I would need to:
- convert BAM to fasta (using
samtools fasta -F 4
) - downsample with Rasusa
- use the read ids in the downsampled fasta to filter my BAM
As such would be a lot easier if Rausa could support BAM files :)
Hi @IsmailM
I'm glad you're finding the tool useful.
Great question. I have some reservations around supporting BAM files as they are not quite as straightforward as fastq/a. For instance, there is the issue of reads having multiple entries in a BAM if there are secondary/supplementary alignments. I.e if the random subsample chooses a secondary alignment entry, should it also have to keep the primary alignment entry?
In the meantime, as you say, your workaround would be the best solution. The added benefit of your solution is that you can apply filtering via samtools
prior to feeding into rasusa
. As I have mentioned elsewhere, it is not my intention to introduce any kind of filtering options for filetypes in rasusa
. The reason for this is that the tool would not strictly be taking a random subsample then. As such, even if I were to implement BAM support you would likely still end up needing to do at least steps 1 and 2 from your current workflow.
Thank you for the feature request nonetheless. If, after discussions, we decide BAM support is not going to happen, I would still very much appreciate input on a code snippet I could add to the README for others trying to do the same thing as you.
@IsmailM I just came across VariantBam, which seems to do what you're after I think?
I would also appreciate input on the code snippet for how to downsample from a bam and end up with a bam again (without re-aligning) :)
Coincidentally, I have been thinking about this feature lately. Depending on how I go over the next few weeks I may look at implementing this feature.
Okay, this is implemented in v1.0.0 in the subcommand aln
. Please try it out and report any issues