qiime2/q2-dada2

Split filtering into its own command?

Opened this issue · 7 comments

Improvement Description
As I understand it, QIIME intends to be more of a push-button pipeline than our R workflows, but I think it would be worth considering separating the filtering in the DADA2 pipeline from the sample processing. It is often useful to filter more than once to see what works well, and I would imagine that there will be other filtering tools that will come online in the QIIME2 ecosystem that people may want to use. It also has a nice effect of reducing the number of parameters/options at each step, and naturally grouping them together.

Downside: Two commands to do what once took one. May need a new semantic type (eg. FilteredVersionOfPreviousType).

I agree that it would make sense to do this. At this stage it makes sense to focus on making functionality modular. When we have Pipeline actions in the future, we can chain the more modular methods together for users who want the more "push button" functionality.

I'd prefer to hold off on this until after our 2017.3 release goes out, instead targeting it for 2017.6. We're pretty swamped for 2017.3. The other benefit of waiting till 2017.6 is that we plan to add Pipeline actions in that release, so it would likely be the case that we wouldn't need to change the interface for denoise-single and denoise-paired. Those could become Pipelines (without users knowing that anything changed under the hood) and then we can add the new methods to split filtering into its own step.

Is there a github issue somewhere that I can follow the progress of Q2 Pipelines?

Yes! @ebolyen is actually working on that right now, and the plan is to have basic pipeline support in place for this release cycle (2017.10). Here's the issue and corresponding pull request:

qiime2/qiime2#86
qiime2/qiime2#348

Pipelines are now merged!

It might be nice to include the plotErrors visualization as part of this process (of course this would require an all new viz in the plugin, but it could wrap the existing plotErrors results). This recently came up on the forum.

Is there an example Pipeline already implemented, or developer documentation on implementing Pipelines?

@benjjneb, the framework section in the release notes has some light documentation. And the core-metrics and core-metrics-phylogenetic actions are pipelines:

Feel free to ping us on Slack also!