metagenome-atlas/atlas

Error in rule DRAM_destill

Closed this issue · 4 comments

  • I checked and didn't found a related issue,e.g. while typing the title
  • ** I got an error in the following rule(s):**
  • I checked the log files indicated indicated in the error message (and the cluster logs if submitted to a cluster)
Error in rule DRAM_destill:
    jobid: 3238
    input: genomes/annotations/dram/annotations.tsv, genomes/annotations/dram/rrnas.tsv, genomes/annotations/dram/trnas.tsv, /home/bladen/databases/DRAM/dram_config_imported
    output: genomes/annotations/dram/distil
    log: logs/dram/distil.log (check log file(s) for error details)
    conda-env: /home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_
    shell:
         DRAM.py distill  --input_file genomes/annotations/dram/annotations.tsv --rrna_path genomes/annotations/dram/rrnas.tsv --trna_path genomes/annotations/dram/trnas.tsv --output_dir genomes/annotations/dram/distil   &> logs/dram/distil.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Here is the relevant log output:

2023-04-18 09:33:32,357 - The log file is created at genomes/annotations/dram/distil/distill.log
2023-04-18 09:33:32,464 - Note: the fallowing id fields were not in the annotations file and are not being used: ['kegg_genes_id', 'kegg_id', 'camper_id', 'fegenie_id', 'sulfur_id', 'methyl_id'], but these are ['ko_id', 'kegg_hit', 'peptidase_family', 'cazy_best_hit', 'pfam_hits']
2023-04-18 09:33:32,487 - Retrieved database locations and descriptions
Traceback (most recent call last):
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'scaffold'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/bin/DRAM.py", line 207, in <module>
    args.func(**args_dict)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 670, in summarize_genomes
    genome_stats = make_genome_stats(annotations, rrna_frame, trna_frame, groupby_column=groupby_column)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 252, in make_genome_stats
    row.append('%s (%s, %s)' % (sixteens['scaffold'].iloc[0], sixteens.begin.iloc[0],
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/frame.py", line 3760, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
    raise KeyError(key) from err
KeyError: 'scaffold'

** Atlas version**
2.15.0
Additional context
Add any other context about the problem here.

I was able to fix this issue by running DRAM.py without the rrnas.tsv file.

In atlas 2.13 this worked automatically however either 2.14 or 2.15 I could not reach this step without manually adding intermediate rrnas.tsv files that were empty except for column labels/header.

However once created I reached a pandas key error in dram distil, noticed the environment defaulted to pandas 2.0 so I rolled back to 1.5.1. I also had formatting issues with the aggregate concatenated rrnas.tsv file so I corrected that as well.

Unfortunately neither of these fixed the issue so i omitted the rrna file entirely and still obtained relevant metabolic information.

I think the issue lies in either the concat_annotation rule or the dram distil rule.

Additionally, i see that dram can integrate gtdbtk and checkm results. Is this a feature that could be implemented?

atlas generates checkm2 and gtdb it just not gives it to the dram to create the report.

I have same bug!