openvax/varlens

Output .bam

JPFinnigan opened this issue · 0 comments

Spoke w/ @timodonnell via email. Asked to post an issue

varlens-reads can be used to subset an input .bam containing only those reads that overlap specific coordinate flags (including coordinates provided by .vcf files). A newly created .bam can be refined to include only those reads that overlap a locus, and meet certain additional filters such as strandedness or MapQ scores

Per the documentation:

    --min-mapping-quality 30 \
    --locus 22:46932040-46932050 \
    --out /tmp/result.bam

I'm wondering whether it would be possible to also add the capability to varlens (or Isovar for that matter) to subset an input .bam to only those reads overlapping a specified locus, and containing a specified base at said locus. For example

$ varlens-reads test/data/CELSR1/bams/bam_0.bam \
--locus 22:46932040-46932041 \
--is-'A'
--out /tmp/result.bam  

Output result.bam would consist of all reads that overlap Chr22:46932040-46932041 (irrespective of start/end sites), and contain the specified base 'A' at this position. If possible, the process could also accept coordinates and bases supplied by a.vcf. However, in when a .vcf is suppled the input result.bam contains those only reads that contain the ALT base, at loci specified by the .vcf

The use-case I have in mind is to generate allele-specific coverage tracks for a particular locus (or set of loci) to compare across treatment conditions. The purpose being facile comparison of--for example--the locus of a putative neoantigen locus pre-/post-treatement showing an immuno-editing event.

The closest example of what I'm thinking of would be the typical CHIP-Seq peak data visualized using GenomeBrowser (e.g. see below), where comparison of multiple pileup "tracks" each corresponding to treatment groups, where a gain/loss of a peak in one/more tracks is indicative of some biologic process.

http://www.bloodjournal.org/content/bloodjournal/127/24/2991/F2.large.jpg?width=800&height=600&carousel=1