How to extract the list of the plots present in multiqc_data.json

Question

How to extract the list of the plots present in multiqc_data.json

blavetn opened this issue 3 years ago · 6 comments

Hello,
I would like to know if there is a way to actually list the plot names present in a report from multiqc_data.json.
The reason is that I try to extract adapter info from the "fastqc_adapter_content_plot" and that I have a case where this plot is not existing in the report because I got "No samples found with any adapter contamination > 0.1%" in the actual multiqc report (html).
So if I need a way to test if a plot is present in my report.
multiqc_data.json.txt

Answer 1 · 2021-09-29T05:50:00.000Z

Hi, I'm thinking that when the plot doesn't exist in the report then TidyMultiqc will just skip it.

If this is true, then you should be able to just check for the presence or absence of a corresponding column in the output dataframe. If this isn't true, then can you please post the error you are getting? Thanks

Answer 2 · 2021-09-29T06:19:42.000Z

My command is the following:

adaptor_max = load_multiqc("multiqc_data.json", sections = 'plots', plot_opts = list(fastqc_adapter_content_plot= list(extractor = extract_ignore_x, summary = list(max = max), prefix = "adaptor")))

Error: Can't subset columns that don't exist.
x Location 1 doesn't exist.
ℹ There are only 0 columns.
Run rlang::last_error() to see where the error occurred

Answer 3 · 2021-09-29T06:35:36.000Z

Okay so I think this is happening just because the code breaks somewhere when the final dataframe has no data in it. I suspect a simple workaround is to make sure to extract one column that is always present, e.g. using sections = c("general", "plots"). I do hope to fix this eventually though. That said, I'm not sure I will expose any way to access the list of plot names before they are parsed because I can't really think of any uses for such a feature.

Answer 4 · 2021-09-29T07:08:41.000Z

Thank you, so using the following command, I have managed to solve my problem:

adaptor_max = load_multiqc("multiqc_data.json", sections = c('general','plots'), plot_opts = list(fastqc_adapter_content_plot= list(extractor = extract_ignore_x, summary = list(max = max), prefix = "adaptor")))

if("plot.adaptor.max" %in% colnames(adaptor_max) == TRUE){
    max_adaptor = round(max(adaptor_max$plot.adaptor.max, na.rm = TRUE),0)
}else{
    max_adaptor = 0.1 # in some cases multiqc is not reporting the plot if "No samples found with any adapter contamination > 0.1%"
}

Answer 5 · 2021-09-29T07:14:21.000Z

Great, yes that solution is what I had in mind. I'll keep this issue open until it's truly fixed.

Answer 6 · 2021-11-27T09:06:17.000Z

This should now be fixed in 1.0.0, with a test. However in order to get that fix you will have to update your use of the package to accommodate the breaking changes.