Error in rule get_all_modules
Closed this issue · 12 comments
- I checked and didn't found a related issue,e.g. while typing the title
- ** I got an error in the following rule(s):**
- I checked the log files indicated indicated in the error message (and the cluster logs if submitted to a cluster)
Here is the relevant log output:
2023-05-13 04:39:11 Uncaught exception: Traceback (most recent call last):
File "/projects/com_perkinsd/common/qc-antibiotics-atlas/.snakemake/scripts/tmp5iy1gxw3.DRAM_get_all_modules.py", line 58, in <module>
module_steps_form = pd.read_csv(
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/common.py", line 713, in get_handle
ioargs = _get_filepath_or_buffer(
File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/common.py", line 451, in _get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>
** Atlas version**
Additional context
Add any other context about the problem here.
It is probable that I have an error in my code. Could you please run atlas on the test data. It worked on my side. https://zenodo.org/record/3992790/files/test_reads.tar.gz
Could you also check your genomes/annotations/dram/annotations.tsv
Here is the head of `genomes/annotations/dram/annotations.tsv`:
gene_position rank strandedness end_position start_position fasta scaffold heme_regulatory_motif_count
MAG18_MAG18_1_1 1 E 1 204 1 MAG18 MAG18_1 0
MAG18_MAG18_1_2 2 E 1 902 207 MAG18 MAG18_1 0
MAG18_MAG18_1_3 3 E -1 3135 1042 MAG18 MAG18_1 0
MAG18_MAG18_1_4 4 E -1 3659 3393 MAG18 MAG18_1 0
MAG18_MAG18_1_5 5 E -1 3999 3811 MAG18 MAG18_1 0
MAG18_MAG18_10_1 1 E 1 659 30 MAG18 MAG18_10 0
MAG18_MAG18_10_2 2 E 1 1319 885 MAG18 MAG18_10 0
MAG18_MAG18_10_3 3 E 1 1996 1316 MAG18 MAG18_10 0
MAG18_MAG18_10_4 4 E 1 3796 2396 MAG18 MAG18_10 0
I received the same DRAM errors as before. However, no errors on the genecatalog side of things. Im going to attempt to re-download the dram database
I'm seeing this on atlas v2.15.0 also. I think it may be related to dram not getting the dram configuration file which specifies all the resources required. I tried setting DRAM_CONFIG_LOCATION
to DRAM/DRAM.config
under the database_dir
set in my atlas config file and that bypassed the first error reported here (ValueError: Invalid file path or buffer object type: <class 'NoneType'>
).
Now I instead run into
2023-05-21 08:23:55 Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.2023-05-21 08:24:19 Uncaught exception: Traceback (most recent call last): File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ko_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/atlas/.snakemake/scripts/tmpfnhn5496.DRAM_get_all_modules.py", line 67, in <module> module_coverage_frame = make_module_coverage_frame( File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 340, in make_module_coverage_frame module_coverage_dict[group] = make_module_coverage_df(frame, module_nets) File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 319, in make_module_coverage_df for gene_id, ko_list in annotation_df[ko_id_name].items(): File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/frame.py", line 3761, in __getitem__ indexer = self.columns.get_loc(key) File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc raise KeyError(key) from err
KeyError: 'ko_id'
I think DRAM expects a kegg_id
or ko_id
column in the annotation file. The dram log has No KEGG source provided so distillation will be of limited use.
so I guess the missing dram config environment variable was causing issues upstream of get_all_modules
. I'm trying to rerun the dram annotation steps to see if I get kegg ids included in the output.
Passing the configuration file to the DRAM_annotate
, DRAM_destill
and get_all_modules
rules fixes the issue for me.
I ran into this today. How do I pass the config file directly to those rules?
There was no activity since some time. I hope your issue is solved in the mean time.
This issue will automatically close soon if no further activity occurs.
Thank you for your contributions.