Q: Why show all donors not just the relevant ones?
corneliusroemer opened this issue · 6 comments
The first question is easy to answer: you used --unique 0
which basically means: Show me all donors which are probably responsible for at least 0 mutations, or even shorter: Show them all. I did not think about that possibility, and I just realize that --unique 0
might be equivalent to --force-all-parents
which makes that option somehow redundant.
The second one… can I just say that my concept of "intermissions" is very weird and should probably be scrapped anyway?
But there's also an actual answer to your question: There are 10 intermissions detected, and if --max-intermission-count
is not explicitly set, it defaults to 8. This means that 8 of 10 are exempt from the breakpoint calculation, the remaining 2 still create 2 breakpoints each, plus the one "actual" breakpoint makes 5.
Honestly, I'm not sure if this concept is broken, or if it is actually useful and just need better documentation and a more intuitive output.
I see, thanks for explaining. I took that query from your README, where it says use this to see more potential recombinants.
Max intermission count now also makes sense - now I understand how this came about. In this case I'd want different settings for both parameters :)
took that query from your README, where it says use this to see more potential recombinants.
Oops 😅 I will update the query in the README to --unique 1
.
In this case I'd want different settings for both parameters
Which two parameters do you mean? Currently there's --breakpoints
, --max-intermission-length
and --max-intermission-count
. What's missing?
The idea of a threshold of intermissions above which you count things as breakpoints is weird. I guess it's necessary to prevent messy sequences from having apparently low numbers of breakpoints...
It's just confusing that these things start counting as breakpoints. You could keep them as a penalty as if they were breakpoints - but not call them as such? Do you see what I mean?
I think I know what you mean - but I'm not yet convinced that it will be less confusing. I believe there must be a better solution.