denoise-*: expose `--collapse-no-mismatch` parameter
nbokulich opened this issue · 10 comments
References
forum xref
I can take this one 👍
How would we want to implement this? That doesn't appear to be a parameter, but a standalone function within the dada2
package.
collapseNoMismatch <- function(seqtab, minOverlap=20, orderBy="abundance", vec=TRUE, verbose=FALSE)
By expose is it adding a new method to q2-dada2
, or having this flag run it using strictly the default parameter values?
I suspect the latter is what we're looking for, since it sounds like the forum user was interested in combining subsequences together. I'm a little curious as to what minOverlap
is for, since it seems like we're dealing with +- a few nucleotides, so having a subsequence of only 20ish sounds pretty bad (and will probably bin into the most abundant read no matter what).
That being said, a new method would let us re-use this elsewhere...
that forum post seemed to suggest that this was a parameter, so that was my assumption when I opened this issue without further investigation.
I like the new method idea though.
I'm not sure the best way to handle adding the function inline w/ denoise, as it'd require the flag to run it or not and also a flag for the overlap parameter - assuming all the other default values are fine to use. I'll start with just the function I suppose. Would joining them have to be a pipeline then? It could always be added inline to denoise as well and keep this as a separate function. ¯\_(ツ)_/¯
I'm not sure the best way to handle adding the function inline w/ denoise, as it'd require the flag to run it or not and also a flag for the overlap parameter
I think this would be conditionally run in the R scripts, right (like the chimera checking section)?
To "qiime" in, if this is something that is going to be integrated into the workflow commands, this may be best left until we migrate the plugin to the version 1.8 of the dada2 R package, as I am expecting to overhaul the q2-dada2 R scripts a bit at that time.
p.s. we are waiting on bioconda updates for that migration to happen: #85 (comment)
If I am not wrong, the 100% clustering of ASV can also be performed using q2-vsearch plugin ? So maybe there is no need to add a specific condition in q2-dada2 script?
What do you think?