ctmrbio/stag-mwc

Create diagnostics plot: barplot of tax composition of filtered Bracken table

Closed this issue · 6 comments

For the use-case of standard run diagnostics we discussed creating a basic barplot showing taxonomic composition based on Kraken2 MinKraken2_V2.

@jwdebelius, would you like to create a robust plotting script that takes as input a set of Kraken2 output files (kreport) and creates a barplot of taxonomic compositions at a user-definable level? Preferably in PDF and PNG. Make sure the script is capable of handling lots of diverse sample types and that the figure size adapts so it can be readable regardless of the numbers of samples included.

Maybe we should also include a subplot showing a basic Alpha diversity metric for each sample as well? If the script computes an Alpha diversity measure, that measure also needs to be stored in table form (TSV?) so it can be included in the default report.

Yes, I can definitely look into the plots.

For alpha diversity, do we want to do depth normalization (Im not sure how with relative abundance) or should we look at something like Breakaway or do a rarefaction curve? Picking a normalization depth seems like something thats harder to do programatically so maybe something to discuss more?

I agree the diversity discussion is more complex. Let's break that part of this out into its own issue: #47

Great that you want to look into making the plotting script. Have a look in the scripts folder to see examples of the other plotting scripts included in StaG (preferably look in the develop branch).

We're getting closer to the CTMR standard MGI workflow. I've recently merged a few PRs that concerns Kraken2-related bits. Specifically Bracken support, with filtered output tables that I think are perfect as a data source for these diagnostic plots.

The most relevant output file is probably output_folder/kraken2/all_samples.S.filtered.bracken.tsv.

A couple of options for the plots before tackling an environment and production model in snakemake

Single area plot:

single_8_set3

Join area plot:

join

I like both of them. Not sure which one will be the best fit for production diagnostics in StaG, but having the option to do both sounds great, then it can also be useful to users as a generic utility plotting script as well!

Nice work, merged the PR to develop now, so I'll close this.