a-h-b/dadasnake

Within-run pooling

Closed this issue · 4 comments

vmikk commented

Hello Anna!

This feature request is somehow related to #6.

Currently, there are three DADA2 modes in Dadasnake: run per sample, pool, pseudo-pooling.
Unfortunately, 120GB RAM is not enough to perform pooled inference on our data.
So we are using sample-wise removal of sequencing errors now (dada_dadaReads.single.R to be exact.)
However, it is possible to perform within-run-pooling.

For this purpose it is possible to use errors/models.{run}.RDS generated for each run and dada_dadaReads.pool.R with FASTQs for the same run as input.

To my surprise, it was much faster (but of course more RAM-demanding) then sample-wise inference (due to the issue mentioned in #6). So this mode will avoid spawning of multiple tasks for creation of merged/{run}/{sample}.RDS and will directly produce merged/dada_merged.{run}.RDS. And, in theory, this mode should have more power in resolving ASVs in comparison with sample-wise inference.

With kind regards,
Vladimir

a-h-b commented

Hi Vladimir-
The reason I've not originally wanted to include this kind of workflow has to do with what I want to influence the ASVs found in each sample - currently dadasnake offers two options: 1) nothing except the sample influences the sample (not pooled), 2) the same samples influence all samples in a study (pooled). The within-run-pooling kind of breaks this logic, and in the worst case ASV detection in strongly influenced by which run a sample was on, especially if there are different run sizes. But I do see that it has its advantages.
I'll set the workflow up and document the caveat for future users. I'll include it in the next release.
Best wishes -
Anna

vmikk commented

Hello Anna!
I see you point as well. And the goal is of course to make ASV inference as robust and deterministic as possible.

Thank you for all your hard work on Dadasnake, this software is very helpful!
With kind regards,
Vladimir

a-h-b commented

Hi Vladimir -
so, the option is now in v0.7.6 . You can use
dada:
pool: within_run
Have a lovely weekend -
Anna

vmikk commented

Hello Anna!
Wow, that's amazing! Thank you so much!

With kind regards,
Vladimir