pha4ge/hAMRonization

[BUG] `KeyError: 'reference_database_name'` when running summarize

Closed this issue ยท 9 comments

Describe the bug

I get the following error when running with summarize

Warning: <_io.TextIOWrapper name='WAL001-megahit.mapping.potential.ARG.deeparg.json' mode='r' encoding='UTF-8'> report is empty
Traceback (most recent call last):
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/bin/hamronize", line 8, in <module>
    sys.exit(main())
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/hamronize.py", line 7, in main
    hAMRonization.Interfaces.generic_cli_interface()
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/Interfaces.py", line 299, in generic_cli_interface
    hAMRonization.summarize.summarize_reports(
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/hAMRonization/summarize.py", line 752, in summarize_reports
    combined_reports = combined_reports.sort_values(
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/util/_decorators.py", line 317, in wrapper
    return func(*args, **kwargs)
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/frame.py", line 6886, in sort_values
    keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/frame.py", line 6886, in <listcomp>
    keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
  File "/home/jfellows/ccdata/users/JFellows/bin/miniconda3/envs/hamronization/lib/python3.10/site-packages/pandas/core/generic.py", line 1849, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'reference_database_name'

Input

hamronize \
    summarize \
    <huge_list_of_jsons> \
    -t interactive \
     \
    -o hamronization_combined_report.html

Input file
I can send a zip of the entire privately if necessary (includes unpublished data)

Error log
See above

hAMRonization Version
1.1.0

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: SUSE Linux Enterprise High Performance Computing 15 SP1
  • Version: hAMRronization 1.1.0

Additional context
Add any other context about the problem here.
If applicable, include dependency versions such as pandas version and Python version.

Ah damn, sorry, I thought I'd solved that issue/covered it with tests. I thought the concatenation should be adding those fields but I'll try initialising the empty combined dataframe earlier.

Does the same error occur if you run hamronize summarize on just WAL001-megahit.mapping.potential.ARG.deeparg.json or only with the big list of jsons?

If the former could you send me just that output file (to finlay.maguire@dal.ca) and if the latter the big ole zip?

Yes, only WAL001-megahit, but strangely it seems to happen in all cases, e.g. VLC009-metaspades.mapping.potential.ARG.deeparg.json which does have hits.

I'll send you the zip and you can test everything. Happy to also test any dev versions!

This seems to work now but please test in your workflow. Instead of trying to add additional columns if needed post-concatenation, I now just initialise an empty dataframe with all the headers in summarize before concatenating.

One question to make sure I haven't failed to fix another issue: These input jsons to summarize weren't cached and not regenerated with the hamronization v1.1.0 right? hamronization should now be generating valid empty jsons (i.e., just files containing []) when parsing empty tool reports but I see these files still have the ] malformation.

  1. OK! I will test this :)
  2. Ah yes correct sorry, the JSONs in the ZIP were still from 1.0.3 - It took a few days for the pipeline to run, so didn't want to run the whole thing again with 1.1.0 to find the same/different summarize issue ๐Ÿ˜… . I can try to take a few and re-generate them with 1.1.0 to double check now though

@fmaguire I can confirm 761fe77 fixes the bug, and that re-running e.g. harmonizate deeparg on an 'empty' outputfile produces the correct empty JSON of [].

Once this version is released on bioconda (I sped that up for 1.1.0 this morning btw ๐Ÿ˜ฌ ), I will update our nf-core nextflow module and re-run the full pipeline again and let you know how well it performs.

This issue can be closed now!

oh and thanks for the quick turnaround :D

Great! Thanks for your patience!

It should already be automatically updating on pypi, dockerhub, and (pending the attentiveness of bioconda bot) updated on bioconda at some point today.

(although it does seem the badges on the README aren't updating for some reason...)

Updated in bioconda now: bioconda/bioconda-recipes#37140