mozack/abra2

Best use cases?

Closed this issue · 9 comments

Hey Lisle,

Great to see a new implementation of ABRA in the works! Just curious- which application would you recommending switching from ABRA to ABRA2 as it stands currently? Not sure how alpha certain aspects are and how niche the development/overall use case is meant to be.

Appreciate any feedback!

Hi,

  • The primary motivation for ABRA2 is to improve variant calling / expressed variant counting in RNA. While development is still in progress, this has been run on a few hundred RNA samples and the results are looking good. We have yet to assess impact on expression counts in general.

  • Scalability is substantially improved, so whole genomes can now be processed. Initial tests against one of the ICGC challenge sets look good. We have not tested much beyond this however.

  • We've heard from a couple of users that an experimental non-assembly based mode has worked OK on amplicon data. We have hardly tested this though.

  • In general, ABRA2 should be more stable than ABRA. We've had to tweak ABRA params for "noisy" samples from time to time - this should no longer be necessary.

  • Theoretically, performance on complex indels should be improved, however this has not yet been tested.

  • The structural variant detection option in ABRA is not available in ABRA2.

  • Some additional usability improvements are also in the works.

Keep in mind that development is still ongoing and there will be some hiccups and regressions along the way. Aside from that, ABRA2 can in general be used on any dataset supported by the original ABRA.

Thanks for the excellent overview on ABRA2. We have a handful of test data across different assay types that ABRA failed on previously for various reasons(complex indels, noisy exome, amplicon data, etc).
I'll take a look at these with ABRA2 and let you know how well they perform. Also I look forward to the increased scalability/speed, especially for WG.

Hi @mozack

Is there any benchmark result on whole genome germline DNA data? Should I tune the parameters to use ABRA2 on DNA data? Is Hg38 supported?

Thanks

We are bench marking somatic calling against a whole genome dataset now. Will post results when they are available.

The default params should be a reasonable starting point for DNA, however you may need to customize for your own dataset.

hg38 is fine and encouraged.

Just to clarify - we plan to benchmark WGS germline DNA data, however this has not yet begun.

Also, while we are using hg38, we are not currently using the alt contigs nor do we have any special alt contig handling in place.

Thanks for the clarification.

I plan to use ABRA2 for trio DNA data. Should I use all the bam files of the 3 sample together as input? Does ABRA2 realign the samples jointly? Thanks.

Yes, processing the 3 samples together is recommended and they are realigned jointly.

ABRA2 should be usable for the originally mentioned use cases now. A manuscript describing it is now available:

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz033/5289536