Report number or fraction of reads that pass each step

Question

Report number or fraction of reads that pass each step

Closed this issue 7 years ago · 12 comments

I'm thinking just add printed reports to the standard output that goes into the log file. The read tracking information is very useful for diagnosing problems.

Answer 1 · 2017-10-11T21:50:10.000Z

👍, this sounds really useful!

Originally came up on the forum here.

Answer 2 · 2017-11-16T22:17:59.000Z

👍 a user that I'm working with now has requested this information. We should now be able to do better than dumping it to standard out - we should be able to put the data into a qzv if/when we turn these methods into Pipelines.

Answer 3 · 2017-11-20T21:24:46.000Z

A visualization of something along these lines was requested in the Q2 forum as well: https://forum.qiime2.org/t/summary-statistics-after-dada2/1860/7

I am over-busy until my class ends, but adding this is high on my to-do list for December.

Answer 4 · 2017-11-21T00:17:55.000Z

I realized another way to do this would be to output stats artifacts which could then be viewable as metadata with qiime metadata tabulate... Anyway, @benjjneb, there are a few ways we could proceed with this, so just let us know when you're ready to work on it.

Answer 5 · 2017-12-12T17:09:55.000Z

Came up on the forum: x-ref

Answer 6 · 2017-12-12T17:37:19.000Z

another forum xref

Answer 7 · 2017-12-12T18:37:37.000Z

The updated R scripts in the 1.6 branch now perform this tracking. However, for now they simply report the results for the top few samples to stdout.

Often that will be sufficient, but it would be easy to write this out in tabular format as well along the lines of @gregcaporaso comment. Suggestions? What format would be appropriate for subsequent Q2 viewing?

Answer 8 · 2017-12-13T17:00:02.000Z

... it would be easy to write this out in tabular format as well along the lines of @gregcaporaso comment. Suggestions? What format would be appropriate for subsequent Q2 viewing?

I think this would be ideal.

@benjjneb, could you paste a few lines of the report that goes to stdout here (or point us at an example)? That'll help us to advise on the most appropriate format.

Answer 9 · 2017-12-13T18:03:43.000Z

Discussed this on Slack with @gregcaporaso. For the 2017.12 release next week, let's just go with the new stdout that @benjjneb described, that way users will have access to it in the debug/verbose logs and it won't require any more work on our end once #78 is merged.

In the 2018.1 release, we can implement the better solution of outputting DADA2 stats that are viewable as Metadata (i.e. what @gregcaporaso suggested above). That solution should be fairly easy to implement, but it'll require a few pieces: a new semantic type, file/dir format, and a transformer to view as Metadata, and the DADA2 methods denoise-single and denoise-paired will need to become Pipelines. @benjjneb, we are happy to help with this for 2018.1 -- having an example of the tabular output from DADA2 would be helpful when you have the chance.

Answer 10 · 2017-12-13T19:52:04.000Z

The output right now looks like this:

                                 input filtered denoised merged non-chimeric
F3D0_S188_L001_R1_001.fastq.gz    7793     7113     7113   6600         6572
F3D1_S189_L001_R1_001.fastq.gz    5869     5299     5299   5078         5067
F3D141_S207_L001_R1_001.fastq.gz  5958     5463     5463   5047         4928
F3D142_S208_L001_R1_001.fastq.gz  3183     2914     2914   2663         2600
F3D143_S209_L001_R1_001.fastq.gz  3178     2941     2941   2575         2550
F3D144_S210_L001_R1_001.fastq.gz  4827     4312     4312   3668         3517

Entries will all be integers (numbers of reads).

Answer 11 · 2017-12-13T22:33:30.000Z

Thanks for the example @benjjneb! Having that printed to stdout for now will be a nice incremental improvement for users debugging their data, and that output should be easy to turn into Metadata for the 2018.1 release

Answer 12 · 2017-12-16T19:31:27.000Z

Fixed in #78