milkschen/leviosam2

small chain gaps cause unliftable alignments

Closed this issue · 3 comments

Hi @milkschen :)

I am trying trying to lift a set of simulated reads derived from CHM13 from CHM13->hg38, but I am getting a curiously high amount un-liftable alignments. I don't specify any unliftable/to-defer regions, and all the source alignments are MAPQ=99.

GitHub isn't letting me upload a small example bam but here are the commands I used along with stdout.

$ leviosam2 lift -C chm13v2-grch38.clft -a data/truth_bams/example.bam -t 8 -p out.lift -O bam -S mapq:30 -m -f data/chm13v2.0.fa
[I::add_split_rule] Adding rule `mapq:30`
[I::lift_run] Loading levioSAM index...done
[I::load_fasta] Loading FASTA...done
[I::WriteDeferred::print_info] Alignments with any of the below features are deferred:
 - unlifted
 - MAPQ (pre-liftover) < 30

[I::main] Finished in 10.7 CPU seconds, or 15.3877 wall clock seconds
$ leviosam2 collate -a out.lift-committed.bam -b out.lift-deferred.bam -p out.lift-paired
Inputs:
 - BAM: out.lift-committed.bam
 - BAM (deferred): out.lift-deferred.bam

Outputs:
 - BAM (committed): out.lift-paired-committed.bam
 - BAM (deferred): out.lift-paired-deferred.bam
 - FASTQ1: out.lift-paired-deferred-R1.fq.gz
 - FASTQ2: out.lift-paired-deferred-R2.fq.gz

Num. unpaired reads1: 6
Num. unpaired reads2: 7
Num. merged unpaired: 13
[I::collate_core] Extract 13 reads from BAM

Finished in 0 CPU seconds, or 0.021676 wall clock seconds

$ samtools view -c out.lift-committed.bam 
65
$ samtools view -c out.lift-unliftable.bam 
15
$ samtools view -c out.lift-deferred.bam 
15

(the unliftable and deferred bams are equal here)

Here's an example of the results
igv_snapshot

Here, I color all the "unliftable" reads and their mates. Again, all these reads have extremely high mapq. That first BED track in the bottom panel shows lifted regions from CHM13->GRCh38. Notice that at least one mate from each colored pair overlaps that 4bp gap in the chain file.

Is a gap in the chain file supposed to cause a read to be unliftable/deferred, even if most of the read occurs outside of the gap? Is there an option in leviosam2 that lets me work around this?

Thanks,
Taher

Yes! The option is -G. I use a default value of 0 for short reads and you can set it higher. For a non-zero -G, sometimes it's useful to also activate the -x <config> option to perform local realignment.

I see! setting -G>0 does appear to help! I'm curious as to why G=0 is default for short reads? Small indels are not rare in short reads, and it seems like a lot of work can be saved by committing the ones that overlap smaller chain gaps (without -x).

It's mostly an efficiency vs. accuracy consideration. Lifting over short-read alignments overlapping gaps (even if small) can sometimes confuse downstream tools such as variant callers, so I turn it off as the default parameter. A future solution could be triggering re-alignment for alignments overlapped with a chain gap.