NOAA-GFDL/MDTF-diagnostics

Attribute error in multirun config w/CESM test

Closed this issue · 11 comments

          @jess Is this a good place to bring up debugging issues? If not please feel free to refer me elsewhere.

I'm trying to initiate a multirun run and getting an error that I'm not sure if it is related to the case information or the pod settings. All I've done so far is start passing the information with the multirun jsonc file.

Can you tell me where to start looking to debug this?
Uncaught exception:
Traceback (most recent call last):
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/mdtf_framework.py", line 68, in
exit_code = main(argv)
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/mdtf_framework.py", line 62, in main
exit_code = framework.main()
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/core.py", line 1204, in main
pod_dict[pod].setup_pod()
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/pod_setup.py", line 139, in setup_pod
self.preprocessor.edit_request(self)
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/preprocessor.py", line 1256, in edit_request
func.edit_request(data_mgr, *args)
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/preprocessor.py", line 116, in wrapped_edit_request
new_v = multirun_wrapped_edit_request_func(self, v, data_mgr)
File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/preprocessor.py", line 1450, in edit_request
data_mgr.attrs.convention, v.standard_name, new_ax_set
AttributeError: 'MultirunDiagnostic' object has no attribute 'attrs'

It's not completely up-to-date code but it doesn't look like multirun stuff has changed between then and now.
83e18c5 - Dani Coleman, 7 weeks ago : Merge branch 'NOAA-GFDL:main' into master
97c2417 - Jess, 8 weeks ago : Add instructions to include POD figs in PR

From my jscon (sorry, I can't seem to upload the whole thing!)
"pod_list" : [
"blocking_neale"
],
// Each CASENAME corresponds to a different simulation/output dataset
"case_list" : [
{
"CASENAME" : "cesm_mdtfv3_timeslice",
"model" : "CESM",
"convention" : "CESM",
"FIRSTYR" : 2013,
"LASTYR" : 2013
},
{
"CASENAME" : "QBOi.EXP1.AMIP.001",
"model" : "CESM",
"convention" : "CESM",
"FIRSTYR" : 1977,
"LASTYR" : 1981
}
],

Originally posted by @bitterbark in #313 (comment)

@bitterbark There have been several commits between the one you're working with that was merged into your working branch in February, and the current commit that have updated and fixed code in the modules used by the multirun configuration. In particular, there are few bugs in xarray that may be propagating into the modules in your stack trace. I recommend pulling in updates to your working branch from the latest main branch. Unfortunately, this means you'll have to update the base and python3_base environment files (not the answer you want, but it will need to be done eventually).

If you want, I can test this out on my machine. Let me know if you're using the existing Blocking Neale POD, or provide your remote branch with updates if you are testing a modified version.

I'll update and try again; I thought it looked orthogonal! Thanks.

Unfortunately the update didn't fix the error.
To repeat the final message, it is

File "/glade/u/home/bundy/mdtf/MDTF_3_main/MDTF-diagnostics/src/preprocessor.py", line 1450, in edit_request
data_mgr.attrs.convention, v.standard_name, new_ax_set
AttributeError: 'MultirunDiagnostic' object has no attribute 'attrs'

I'm tracking down how the attributes are set in multi vs single case.
It seems to be coming from this call preprocessor.py

@multirun_edit_request_wrapper:edit_request
class MultirunExtractLevelFunction(ExtractLevelFunction):
...
new_tv_name = core.VariableTranslator().from_CF_name(
data_mgr.attrs.convention, v.standard_name, new_ax_set

I'm trying to deciper the differences between the multi & single
calls. Since you just wrote that stuff maybe you can find it faster but
I'll keep working on it. Assuming you have plenty on your plate but if the
solution is obvious to you please let me know!

I'm running with the data requirement "scalar_coordinates": {"lev" : 500}
which might be why this hasn't been seen before. It is Rich's blocking pod,
so far completely unmodified.

To zoom out a little, I'm finding it difficult to compare code when there are
copies of the multi-case routines in the same file as the original
single-case; is there some reason why we can't have a differently-named
file with the multi- vs single-case functions, for easy diffing (eg. in
preprocessor.py). Or even better: could we modify the functions to work
with either multi or single case so we dno't have copies of
almost-identical functions? Is this just a stop-gap solution because we'll
replace the preprocessor soon?

@bitterbark Thanks for trying the updates and digging some more into the issue. I probably missed something from the parent classe(s) and Rich's POD caught is with the multirun config. I'll work on it when I have some time this week.

As for the code design issue--you have identified the (many) drawbacks of my "inherit-everything-you-need-and-try-not-to-break-anything" approach. I did some consolidation, but the module dependencies and sizes of some of the classes made it impossible to consolidate all multirun functionality into a single module (believe me, I tried). Refactoring will be part of the PP redesign now that I'm not a complete newbie to MDTF-diagnostics or OOP in general.

Thanks for your help, Jess. I totally understand about not being able to write the code the way you want!
I'm going to work on this this afternoon so if you have any insight into directions I should pursue, let me know. Otherwise, no pressure, I know you have a lot on your plate!

@bitterbark I was able to fix the bug in the ExtractLevelFunction Call--this is the first test of level extraction in multirun mode, so I'm glad you caught it. However, the framework is having issues finding data (see PR linked above). I'll work on this some more tomorrow. Note that I only have the QBOI dataset to work with (made a dummy copy with a different name to test with), so please let me know where I can grab the cesm_mdtfv3_timeslice data when you have a chance.

Great, thanks @wrongkindofdoctor!
The cesm_mdtfv3_timeslice data is available on globus

Also, I grabbed the mod and will work on the data error.

Data error is my fault; the QBOi...Z500 file is bad. Will post it when fixed.

I fixed the QBOi ...Z500 file. To be honest it is from another run because we didn't save that field in the QBOi run because this pod wasn't in existance, but I've gotten the preprocessor to load it, anyway. From the
globus endpoint
cesm_mdtfv1_timeslice_public/QBOi.EXP1.AMIP.001/day/QBOi.EXP1.AMIP.001.Z500.day.nc

@bitterbark Thank you so much! I merged the bug fix into the main branch, and will update GFDL's local copies of the QBOi.EXP1.AMIP.001.Z500.day.nc data.