Relationship to filtlong
tseemann opened this issue · 2 comments
FYI - a comment from a colleague:
You can still use filtlong with the below settings to
focus on quality only and to more or less ignore length in the scoring metric
--min_length 500
--mean_q_weight 10
--length_weight 1
--target_bases $((DEPTH * GENOMESIZE))
Yes, this is exactly how I have been using filtlong
previously (exact same weights and all). There is still some filtering of read length happening here, which is a (subtle) bias. I am very keen to keep this project out of the filtering business as there are already great tools for this.
Removing the --min_length
option here obviously is much more unbias, but still, there is a scoring system at work, which is not strictly random. In my experience with these weightings, it does not focus purely on quality, there is definitely still some length-favouring that happens. I guess my aim with rasusa
was to provide as little parameters as possible. i.e. users don't need to play with scoring weights etc. Maybe I am being silly and everyone will keep using filtlong
, which is also fine.
I have a section in the motivation where I mention how filtlong
can be co-opted to do something similar. Do you think I need to provide better clarification around filtlong
?
I don't know if this is also of interest, but in my local benchmarking rasusa
was significantly faster than filtlong
. But I don't feel comfortable focusing on this as I am not trying to compete with filtlong
.
No worries - all good - thanks for explaination.