Issue with the "relative_support" filter
bioPG opened this issue · 1 comments
Forgot to reply to this, sorry.
It means this: Arriba counts the number of fusion candidates (events) involving a given gene. These could be true fusions or artifacts. They need not even involve the same pair of genes (BCR-ABL1) - Arriba counts all events affecting a given gene. This is to estimate the level of background noise. When a gene has many events, Arriba applies more stringent filtering to compensate for the increased level of background noise. Most of them will be artifacts anyway. Highly expressed genes or hard-to-align regions would be two examples for artifact-attracting regions giving rise to many events. By "more stringent filtering" I mean Arriba requires events to have more supporting reads. This is the purpose of the relative_support filter: It passes only those events which have a sizable number of supporting reads relative to the level of background noise/total number of events. The relationship between the number of events and the minimum required number of supporting reads is modeled as a polynomial function.
I hope this explanation is clearer.