qiime2/q2-dada2

Add pooling options to Q2 workflows

Closed this issue · 8 comments

Improvement Description
Add a new option that allows users to pick independent sample processing (as done currently), pooled sample processing, or "pseudo-pooling" that was added in 1.7.5. It probably makes sense to wait until the R package 1.8 release is available (~June) to add this.

The pooling options provide better detection of rare per-sample variants at the cost of increased computation time.

Also consider making pseudo-pooling the default processing mode.

References
"pseudo-pooling" that was added in 1.7.5

Question: Can default parameter choices be dependent on other parameter choices?

The reason I ask: Pooled chimera removal is better if pooled sample inference is performed, but the default chimera removal is consensus, which is better for the default sample inference method (independent). So, can chimera removal be defaulted to pooled if the user selects pooled sample inference?

Not really. There would be a way to refine the types based on other types passed (should be available next week-ish), but that would categorically prevent mixing the two.

A different approach would be to have the two steps be separate actions, and then in a pipeline which composes them, you have a "simpler" argument which unifies the two arguments. That way, the "default" invocation does the ideal thing for inference and chimera checking, but mixing them is still possible if you run the sub-actions directly.

My current idea is to change the default chimera method to "auto", which chooses "consensus" or "pooled" chimera removal depending on the choice made at the sample inference step. Users will still be able to define the chimera removal method themselves in which case that choice will be used.

That achieves my goal here of defaulting to the "right" chimera removal method for each sample inference method, but let me know if that seems a bad idea.

That works too! There are a few places where we have similar patterns.

Thanks @benjjneb --- we can try and coordinate efforts, too --- if you want to pass things off in a semi-usable state one of us can probably run it across the finish line.

The R code for pseudo-pooling in the Q2 plugin is working on my end in #122