small chain gaps cause unliftable alignments
Closed this issue · 3 comments
Hi @milkschen :)
I am trying trying to lift a set of simulated reads derived from CHM13 from CHM13->hg38, but I am getting a curiously high amount un-liftable alignments. I don't specify any unliftable/to-defer regions, and all the source alignments are MAPQ=99.
GitHub isn't letting me upload a small example bam but here are the commands I used along with stdout.
$ leviosam2 lift -C chm13v2-grch38.clft -a data/truth_bams/example.bam -t 8 -p out.lift -O bam -S mapq:30 -m -f data/chm13v2.0.fa
[I::add_split_rule] Adding rule `mapq:30`
[I::lift_run] Loading levioSAM index...done
[I::load_fasta] Loading FASTA...done
[I::WriteDeferred::print_info] Alignments with any of the below features are deferred:
- unlifted
- MAPQ (pre-liftover) < 30
[I::main] Finished in 10.7 CPU seconds, or 15.3877 wall clock seconds
$ leviosam2 collate -a out.lift-committed.bam -b out.lift-deferred.bam -p out.lift-paired
Inputs:
- BAM: out.lift-committed.bam
- BAM (deferred): out.lift-deferred.bam
Outputs:
- BAM (committed): out.lift-paired-committed.bam
- BAM (deferred): out.lift-paired-deferred.bam
- FASTQ1: out.lift-paired-deferred-R1.fq.gz
- FASTQ2: out.lift-paired-deferred-R2.fq.gz
Num. unpaired reads1: 6
Num. unpaired reads2: 7
Num. merged unpaired: 13
[I::collate_core] Extract 13 reads from BAM
Finished in 0 CPU seconds, or 0.021676 wall clock seconds
$ samtools view -c out.lift-committed.bam
65
$ samtools view -c out.lift-unliftable.bam
15
$ samtools view -c out.lift-deferred.bam
15
(the unliftable and deferred bams are equal here)
Here's an example of the results
Here, I color all the "unliftable" reads and their mates. Again, all these reads have extremely high mapq. That first BED track in the bottom panel shows lifted regions from CHM13->GRCh38. Notice that at least one mate from each colored pair overlaps that 4bp gap in the chain file.
Is a gap in the chain file supposed to cause a read to be unliftable/deferred, even if most of the read occurs outside of the gap? Is there an option in leviosam2 that lets me work around this?
Thanks,
Taher
Yes! The option is -G
. I use a default value of 0 for short reads and you can set it higher. For a non-zero -G
, sometimes it's useful to also activate the -x <config>
option to perform local realignment.
I see! setting -G>0
does appear to help! I'm curious as to why G=0
is default for short reads? Small indels are not rare in short reads, and it seems like a lot of work can be saved by committing the ones that overlap smaller chain gaps (without -x
).
It's mostly an efficiency vs. accuracy consideration. Lifting over short-read alignments overlapping gaps (even if small) can sometimes confuse downstream tools such as variant callers, so I turn it off as the default parameter. A future solution could be triggering re-alignment for alignments overlapped with a chain gap.