Change the default 'combinewts' option of averager from 0 to 1 ?
jypeter opened this issue · 12 comments
@durack1 what do you think of my problem below?
I have very recently discovered that I may have misused genutil.averager
incorrectly for years. I assumed that the order in which you did a spatial average did not matter, but this seems to be only true when the variable is not masked
The results are quite different! I think that the user expects what you get when specifying combinewts=1
, except that the default is ZERO! That's why you may want to change the default value
genutil.averager
is a very useful component of CDAT but there does not seem to be an example script/notebook using it on the web site. You may want to add one
Example with one time step of clt.nc
No problem here, because there are no masked points
>>> clt.shape
(46, 72)
>>> clt.count()
3312
>>> 46*72
3312
>>> cdutil.averager(clt, axis='xy', weights=['weighted','weighted'])
variable_182
masked_array(data=62.71149853,
mask=False,
fill_value=1e+20)
>>> cdutil.averager(clt, axis='yx', weights=['weighted','weighted'])
variable_190
masked_array(data=62.71149853,
mask=False,
fill_value=1e+20)
>>> cdutil.averager(clt, axis='yx', weights=['weighted','weighted'], combinewts=1)
variable_204
masked_array(data=62.71149853,
mask=False,
fill_value=1e+20)
Example with one time step of tas_cru_1979.nc
Some points are masked, and the values returned by cdutil.averager are quite different
>>> tas.shape
(36, 72)
>>> 36*72
2592
>>> tas.count()
1823
>>> genutil.averager(tas, axis='xy', weights=['weighted','weighted'])
variable_84
masked_array(data=12.62572236,
mask=False,
fill_value=1e+20)
>>> genutil.averager(tas, axis='yx', weights=['weighted','weighted'])
variable_92
masked_array(data=14.68750813,
mask=False,
fill_value=1e+20)
>>> genutil.averager(tas, axis='yx', weights=['weighted','weighted'], combinewts=1)
variable_110
masked_array(data=14.70993489,
mask=False,
fill_value=1e+20)
>>> genutil.averager(tas, axis='xy', weights=['weighted','weighted'], combinewts=1)
variable_130
masked_array(data=14.70993489,
mask=False,
fill_value=1e+20)
@jypeter I tocuh on this issue here: https://github.com/CDAT/Jupyter-notebooks/blob/master/scientific/Detrend_Data/Detrend_data.ipynb
@davis278 might have another/cleaner version
@jypeter and @doutriaux1, I do have a more detailed version of the Detrend Data notebook, but it is in review by a scientist who is presently at AGU. I will see if the review can be completed next week so we can publish the more detailed version of the Detrend Data notebook as soon as possible.
@doutriaux1 you do mention highlight a scientific point about the order of operations, but everything in the script use axis='xy'
, so we don't really know what would append if the user used axis='yx'
instead, and which one is better. I think that the only way to get the correct result would be to explicitly use combinewts=1
, or have this as a default
Besides, as I have mentioned in the 3rd bullet point of CDAT/cdat.github.io#197, when you specify `axis='xy', the user does not know (that is, the documentation does not say) if the average is performed on x then y, or y then x
I think it would be extremely useful to have a notebook covering correct (that is weighted) spatial and temporal averaging, and showing that the results you get can be quite different than what you get with arithmetic averaging. That's a point that we have to tell our new students over and over, and I'm not sure that other python packages than genutil make it possible.
This would also be the opportunity to demonstrate getWeight
, how to multiply the cell weights by the Earth surface to get the cells' surface (a question we also get from time to time), how to use setTimeBoundsMonthly
and the like, ...
Maybe spatial and temporal averaging could go to different notebooks, since temporal should also cover all the predefined climatology
stuff
A notebook about correct regridding would be nice as well :)
These should probably go to CDUTIL and GENUTIL (should it just be called genutil now?) on Jupyter-notebooks Tutorials, and guess what, there seems to be a notebook almost ready for this! CDAT/Jupyter-notebooks#25
@jypeter thanks for raising this issue. The CDAT package provides a user with a phenomenal head start as to data manipulation, but as you have highlighted nuances of the function defaults needs to be carefully considered. I think augmenting the examples, along with better function documentation would be a great step forward
@jypeter thanks for your comments. I've followed up on the review of the new version of the Detrend Data notebook and due to the holidays, it likely will not be ready until the new year. If you have a pressing need, let me know and I'll try to get you a copy of the notebook sooner rather than later. Admittedly, the revised Detrend Data notebook addresses only a few of the points raised here.
Thanks @davis278. There is no rush for the detrend notebook, especially if it does not mention combinewts
.
- maybe somebody should have a quick look at the existing notebooks using
averager
, and make sure they usecombinewts=1
if this is required to have a correct result with masked data - I still think
combinewts=1
should be the default, because I'm not sure what you get withcombinewts=1
andaxis='xy
' oraxis='yx'
makes sense (ask users if they really want to get different results with these when their data is masked...) - if the CDAT/Jupyter-notebooks#25 notebook (mentioned above) is not on the web site, it should be put back on a fast track to get there
Marking issue as stale, since there has been no activity in 30 days.
Unless the issue is updated or the 'stale' tag is removed, this issue will be closed in 7 days.
I still think that we need a SAFE default value for combinewts
! Because we don't know if the users will read all the docstring and find the notebook
averager
is a super useful asset of CDAT and would probably deserve its own notebook. Especially showing how you can easily get wrong results if you just use an arithmetic mean for spatial averaging (or the wrong parameters for the function)
Marking issue as stale, since there has been no activity in 30 days.
Unless the issue is updated or the 'stale' tag is removed, this issue will be closed in 7 days.