Output .bam
JPFinnigan opened this issue · 0 comments
Spoke w/ @timodonnell via email. Asked to post an issue
varlens-reads
can be used to subset an input .bam
containing only those reads that overlap specific coordinate flags (including coordinates provided by .vcf
files). A newly created .bam
can be refined to include only those reads that overlap a locus, and meet certain additional filters such as strandedness or MapQ scores
Per the documentation:
--min-mapping-quality 30 \
--locus 22:46932040-46932050 \
--out /tmp/result.bam
I'm wondering whether it would be possible to also add the capability to varlens (or Isovar for that matter) to subset an input .bam
to only those reads overlapping a specified locus, and containing a specified base at said locus. For example
$ varlens-reads test/data/CELSR1/bams/bam_0.bam \
--locus 22:46932040-46932041 \
--is-'A'
--out /tmp/result.bam
Output result.bam
would consist of all reads that overlap Chr22:46932040-46932041 (irrespective of start/end sites), and contain the specified base 'A' at this position. If possible, the process could also accept coordinates and bases supplied by a.vcf
. However, in when a .vcf
is suppled the input result.bam
contains those only reads that contain the ALT base, at loci specified by the .vcf
The use-case I have in mind is to generate allele-specific coverage tracks for a particular locus (or set of loci) to compare across treatment conditions. The purpose being facile comparison of--for example--the locus of a putative neoantigen locus pre-/post-treatement showing an immuno-editing event.
The closest example of what I'm thinking of would be the typical CHIP-Seq peak data visualized using GenomeBrowser (e.g. see below), where comparison of multiple pileup "tracks" each corresponding to treatment groups, where a gain/loss of a peak in one/more tracks is indicative of some biologic process.