metagenome-atlas/atlas

Error in rule build_bin_report

Closed this issue · 8 comments

Error in rule build_bin_report:
jobid: 91
input: reports/genomic_bins_DASTool.tsv
output: reports/bin_report_DASTool.html
log: logs/binning/report_DASTool.log (check log file(s) for error message)
conda-env: /gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83_

RuleException:
CalledProcessError in line 536 of /home/a40540/anaconda3/envs/py3xplus/lib/python3.10/site-packages/atlas/workflow/rules/binning.smk:
Command 'source /home/a40540/anaconda3/envs/py3xplus/bin/activate '/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83_'; set -euo pipefail; python /gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/.snakemake/scripts/tmpl_2i0m4t.bin_report.py' returned non-zero exit status 1.
File "/home/a40540/anaconda3/envs/py3xplus/lib/python3.10/site-packages/atlas/workflow/rules/binning.smk", line 536, in __rule_build_bin_report
File "/home/a40540/anaconda3/envs/py3xplus/lib/python3.10/concurrent/futures/thread.py", line 58, in run



** Atlas version 2.13**

**Additional context**
I am unable to figure out any solution for the error

what's in logs/binning/report_DASTool.log

2022-12-06 09:21:28 Uncaught exception: Traceback (most recent call last):
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/.snakemake/scripts/tmpl_2i0m4t.bin_report.py", line 108, in
div = make_plots(bin_table=snakemake.input.bin_table)
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/.snakemake/scripts/tmpl_2i0m4t.bin_report.py", line 66, in make_plots
fig = px.scatter(
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83_/lib/python3.9/site-packages/plotly/express/chart_types.py", line 66, in scatter
return make_figure(args=locals(), constructor=go.Scatter)
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83
/lib/python3.9/site-packages/plotly/express/core.py", line 1933, in make_figure
args = build_dataframe(args, constructor)
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83
/lib/python3.9/site-packages/plotly/express/core.py", line 1405, in build_dataframe
df_output, wide_id_vars = process_args_into_dataframe(
File "/gpfs/gpfs0/felles/.felles2/fres/Sample_1-S1R1/databases/conda_envs/1c7c41324558b982b4847af049d3fd83
/lib/python3.9/site-packages/plotly/express/_core.py", line 1207, in process_args_into_dataframe
raise ValueError(err_msg)
ValueError: Value of 'hover_data_0' is not the name of a column in 'data_frame'. Expected one of ['Bin Id', 'Completeness', 'Contamination', 'Strain heterogeneity', '# unique markers (of 43)', '# multi-copy', 'Insertion branch UID', 'Taxonomy (contained)', 'Taxonomy (sister lineage)', 'GC', 'Genome size (Mbp)', 'Gene count', 'Coding density', 'Sample', 'Domain', 'phylum', 'class', 'order', 'family', 'Quality Score'] but received: genus

It seems none of them have a genus identified. wich makes my break my code, unfortunately.

  • I should fix that.

The reason is either you have very novel species or very bad bins.

In the first option it would make sense to contine atlas with --keep-going
in the second option it doesn't make sense to proceed.

Can you check reports/genomic_bins_DASTool.tsv are you happy with the genomes recovered?

We are expecting novel bacteria as the sample is from an extreme environment. But your expressions of very bad bins makes me nervous. I am giving you the range for the different headers:
completeness : 98.22 - 16.3
contamination : 34.36 - 0
strain heterogeneity : 93.75 - 0
Is it helpful? Please let me know. I can share a piece of the result

Cool, Could I ask which extreme environment?

To recapitulate, the error is simply due that cannot plot the output of the binning, because the genus is not defined.
I suggest you plot the output in the table reports/genomic_bins_DASTool.tsv yourself, similar to this one https://github.com/metagenome-atlas/Tutorial/blob/master/Tutorial/images/quality.svg

What you are looking for are genomes that have >50 completeness and < 10 contamination. Maybe you want to even lower these requirements a bit for your case.

I think you can continue the pipeline, with --keep-going or --omit-from build_bin_report

I guess you will have somewhat lower mapping rate at the end so any relative abundance would be a bit biased. but We will see.

There was no activity since some time. I hope your issue is solved in the mean time.
This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

There was no activity since some time. I hope your issue is solved in the mean time.
This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.