suhrig/arriba

Issue with the "relative_support" filter

bioPG opened this issue · 1 comments

bioPG commented

Can the sentence inside the red box be understood as follows: For a fusion event, such as BCR-ABL1, it may have multiple breakpoints corresponding to multiple events, and the number of events corresponding to different supporting reads is polynomially related to the event itself.

image
suhrig commented

Forgot to reply to this, sorry.

It means this: Arriba counts the number of fusion candidates (events) involving a given gene. These could be true fusions or artifacts. They need not even involve the same pair of genes (BCR-ABL1) - Arriba counts all events affecting a given gene. This is to estimate the level of background noise. When a gene has many events, Arriba applies more stringent filtering to compensate for the increased level of background noise. Most of them will be artifacts anyway. Highly expressed genes or hard-to-align regions would be two examples for artifact-attracting regions giving rise to many events. By "more stringent filtering" I mean Arriba requires events to have more supporting reads. This is the purpose of the relative_support filter: It passes only those events which have a sizable number of supporting reads relative to the level of background noise/total number of events. The relationship between the number of events and the minimum required number of supporting reads is modeled as a polynomial function.

I hope this explanation is clearer.