[BUG] `KeyError: 'reference_database_name'` when running summarize
Closed this issue ยท 9 comments
Describe the bug
I get the following error when running with summarize
Warning: <_io.TextIOWrapper name='WAL001-megahit.mapping.potential.ARG.deeparg.json' mode='r' encoding='UTF-8'> report is empty
Traceback (most recent call last):
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/bin/hamronize", line 8, in <module>
sys.exit(main())
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/hamronize.py", line 7, in main
hAMRonization.Interfaces.generic_cli_interface()
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/Interfaces.py", line 299, in generic_cli_interface
hAMRonization.summarize.summarize_reports(
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/summarize.py", line 752, in summarize_reports
combined_reports = combined_reports.sort_values(
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/util/_decorators.py", line 317, in wrapper
return func(*args, **kwargs)
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/frame.py", line 6886, in sort_values
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/frame.py", line 6886, in <listcomp>
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/generic.py", line 1849, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'reference_database_name'
Input
hamronize \
summarize \
<huge_list_of_jsons> \
-t interactive \
\
-o hamronization_combined_report.html
Input file
I can send a zip of the entire privately if necessary (includes unpublished data)
Error log
See above
hAMRonization Version
1.1.0
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: SUSE Linux Enterprise High Performance Computing 15 SP1
- Version: hAMRronization 1.1.0
Additional context
Add any other context about the problem here.
If applicable, include dependency versions such as pandas version and Python version.
Ah damn, sorry, I thought I'd solved that issue/covered it with tests. I thought the concatenation should be adding those fields but I'll try initialising the empty combined dataframe earlier.
Does the same error occur if you run hamronize summarize
on just WAL001-megahit.mapping.potential.ARG.deeparg.json
or only with the big list of jsons?
If the former could you send me just that output file (to finlay.maguire@dal.ca) and if the latter the big ole zip?
Yes, only WAL001-megahit, but strangely it seems to happen in all cases, e.g. VLC009-metaspades.mapping.potential.ARG.deeparg.json
which does have hits.
I'll send you the zip and you can test everything. Happy to also test any dev versions!
This seems to work now but please test in your workflow. Instead of trying to add additional columns if needed post-concatenation, I now just initialise an empty dataframe with all the headers in summarize before concatenating.
One question to make sure I haven't failed to fix another issue: These input jsons to summarize weren't cached and not regenerated with the hamronization v1.1.0 right? hamronization should now be generating valid empty jsons (i.e., just files containing []
) when parsing empty tool reports but I see these files still have the ]
malformation.
- OK! I will test this :)
- Ah yes correct sorry, the JSONs in the ZIP were still from 1.0.3 - It took a few days for the pipeline to run, so didn't want to run the whole thing again with 1.1.0 to find the same/different summarize issue ๐ . I can try to take a few and re-generate them with 1.1.0 to double check now though
@fmaguire I can confirm 761fe77 fixes the bug, and that re-running e.g. harmonizate deeparg
on an 'empty' outputfile produces the correct empty JSON of []
.
Once this version is released on bioconda (I sped that up for 1.1.0 this morning btw ๐ฌ ), I will update our nf-core nextflow module and re-run the full pipeline again and let you know how well it performs.
This issue can be closed now!
oh and thanks for the quick turnaround :D
Great! Thanks for your patience!
It should already be automatically updating on pypi, dockerhub, and (pending the attentiveness of bioconda bot) updated on bioconda at some point today.
(although it does seem the badges on the README aren't updating for some reason...)
Updated in bioconda now: bioconda/bioconda-recipes#37140