Error in rule DRAM_destill
Closed this issue · 4 comments
- I checked and didn't found a related issue,e.g. while typing the title
- ** I got an error in the following rule(s):**
- I checked the log files indicated indicated in the error message (and the cluster logs if submitted to a cluster)
Error in rule DRAM_destill:
jobid: 3238
input: genomes/annotations/dram/annotations.tsv, genomes/annotations/dram/rrnas.tsv, genomes/annotations/dram/trnas.tsv, /home/bladen/databases/DRAM/dram_config_imported
output: genomes/annotations/dram/distil
log: logs/dram/distil.log (check log file(s) for error details)
conda-env: /home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_
shell:
DRAM.py distill --input_file genomes/annotations/dram/annotations.tsv --rrna_path genomes/annotations/dram/rrnas.tsv --trna_path genomes/annotations/dram/trnas.tsv --output_dir genomes/annotations/dram/distil &> logs/dram/distil.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Here is the relevant log output:
2023-04-18 09:33:32,357 - The log file is created at genomes/annotations/dram/distil/distill.log
2023-04-18 09:33:32,464 - Note: the fallowing id fields were not in the annotations file and are not being used: ['kegg_genes_id', 'kegg_id', 'camper_id', 'fegenie_id', 'sulfur_id', 'methyl_id'], but these are ['ko_id', 'kegg_hit', 'peptidase_family', 'cazy_best_hit', 'pfam_hits']
2023-04-18 09:33:32,487 - Retrieved database locations and descriptions
Traceback (most recent call last):
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'scaffold'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/bin/DRAM.py", line 207, in <module>
args.func(**args_dict)
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 670, in summarize_genomes
genome_stats = make_genome_stats(annotations, rrna_frame, trna_frame, groupby_column=groupby_column)
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 252, in make_genome_stats
row.append('%s (%s, %s)' % (sixteens['scaffold'].iloc[0], sixteens.begin.iloc[0],
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/frame.py", line 3760, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
raise KeyError(key) from err
KeyError: 'scaffold'
** Atlas version**
2.15.0
Additional context
Add any other context about the problem here.
I was able to fix this issue by running DRAM.py without the rrnas.tsv file.
In atlas 2.13 this worked automatically however either 2.14 or 2.15 I could not reach this step without manually adding intermediate rrnas.tsv files that were empty except for column labels/header.
However once created I reached a pandas key error in dram distil, noticed the environment defaulted to pandas 2.0 so I rolled back to 1.5.1. I also had formatting issues with the aggregate concatenated rrnas.tsv file so I corrected that as well.
Unfortunately neither of these fixed the issue so i omitted the rrna file entirely and still obtained relevant metabolic information.
I think the issue lies in either the concat_annotation rule or the dram distil rule.
Additionally, i see that dram can integrate gtdbtk and checkm results. Is this a feature that could be implemented?
atlas generates checkm2 and gtdb it just not gives it to the dram to create the report.
I have same bug!