rubisco-sfa/ILAMB

Integration bounds

goujou opened this issue · 3 comments

Using the method Variable.integrateInTime with t0 == tf should lead to a zero value as far as I interpret integrals. Unfortunately, it does not because

line 299: time_bnds[(t0>time_bnds[:,0])*(t0<time_bnds[:,1]),0] = t0

changes the variable time_bnds in a way that the idea of line

300: time_bnds[(tf>time_bnds[:,0])*(tf<time_bnds[:,1]),1] = tf

does not have any effect. So the end boundary of the integral is indeed larger than the value specified. Actually, I am not sure of what other things might happen in other situations. It is a tricky thing to check.

The same issue is very likely to happen in the integrateInDepth method, lines 418--423.

Yes, there is some ambiguity here. Appreciate you bringing up the issue, the following is a little background.

On the one hand, what you describe is exactly how trimming / integration bounds should work (and it doesn't). If you do things this way, then you should also split cells. Say if you are integrating in time from t0 to tf, but those do not line up with the time_bounds breaks of a given dataset. Then you need to find the interval in which the integration limit is located, split that cell and update its time and time_bound values.

For time, this is not so hard. For space, this means also updating cell areas--doable but more complicated. You would need to store the land fractions with the variable so you could get the cell area from the new lat/lon bounds and then also multiply by the same land fraction to get the new land area. This is all doable and how it "should" work, but we run into problems doing this.

What happens a lot is that modeling centers have slightly different ways of representing, for example, time. So then when you try to compare model outputs to observational data the times limits of the intervals are technically off, but the same in spirit. For example, they are all monthly means just have slightly different beginning and ending times from my best attempt to convert their calendars. If you are precise about the limits and trimming, you end up with complicated comparisons--even if the times of the two datasets you are comparing are only slightly off.

For that reason, the idea of trimming or integration limits is more in the spirit of including the intervals which are to be part of the integration. It isn't as precise, but it ends up getting us more of what we wanted in the first place. As you are digging through the code, you will see that I am inconsistent--a sign that I cannot make up my mind either. There are certainly pathological cases like what you raised.

I probably need to develop a terminology for (a) when you want to include cells in a limit but not split them and (b) when you really intend to precisely trim. We use both concepts in the analysis. Ultimately my plan for down the road is to re-implement ILAMB.Variable as an accessor of a xarray.DataArray. Here is some initial work:

https://gist.github.com/nocollier/4e020d830ff80105a62789bb675350f4

I like a lot of what xarray has, but their analysis treats the information as discrete data and not a function. This could be a good time to improve this

At the moment I cannot get too far into fixing this, but will flag this discussion for future enhancements.

I very much appreciate the work you are doing. Your code is great and saves me from falling into hundreds of traps I would not even have been aware of. The way people use 'time' in netCDF files is indeed very confusing, even more the use of the time bounds. I am totally on your side that there is no way to make the code perfect for everyone. For that reason, I would like to take your code and adapt it in the few lines that need adaptation for my purposes. If you are fine with that, let me know how to cite you correctly. As I said, your code is a huge help for me.

From what I expected, I thought you would split the intervals for integration purposes. Maybe you should mention and point out that you do not. Actually, that was one of my main reasons to use ILAMB in the first place. It is fine that it is not handled the way I would like to have it, easy enough to adapt it, but maybe there is a way to make this clear from the beginning.

Another thing I could not understand: The computation of bounds (_createBnds) seems strange to me. For example in depth, why would there be a value with a negative sign? It disturbs the integral. Is this just a best-guess-solution? If so, then I am fine with it. To me (coming from a slightly different field) it seems like THE way to go, and I cannot see why. Might be good in some cases, might be bad in other cases.

Still, I think that the issue I mentioned here is a bug. In my opinion the code does not do what it was intended to do. I show you my workaround, then you might better understand what I mean:

ILAMB code might be flawed here

depth_bnds = np.copy(self.depth_bnds)

ind = np.where((z0<depth_bnds[:,1])*(zf>depth_bnds[:,0]))[0]

depth_bnds[(z0>depth_bnds[:,0])*(z0<depth_bnds[:,1]),0] = z0

depth_bnds[(zf>depth_bnds[:,0])*(zf<depth_bnds[:,1]),1] = zf

depth_bnds = depth_bnds[ind,:]

dz = (depth_bnds[:,1]-depth_bnds[:,0])

fixed code begin
    depth_bnds = np.copy(self.depth_bnds)
    depth_bnds_0 = np.copy(depth_bnds[:,0])
    depth_bnds_1 = np.copy(depth_bnds[:,1])
    ind        = np.where((z0<depth_bnds[:,1])*(zf>depth_bnds[:,0]))[0]
    depth_bnds_0[(z0>depth_bnds[:,0])*(z0<depth_bnds[:,1])] = z0
    depth_bnds_1[(zf>depth_bnds[:,0])*(zf<depth_bnds[:,1])] = zf
    depth_bnds = depth_bnds[ind,:]
    depth_bnds[:,0] = depth_bnds_0[ind]
    depth_bnds[:,1] = depth_bnds_1[ind]
    dz         = (depth_bnds[:,1]-depth_bnds[:,0])
fixed code end

I actually moved away from xarray, then I was told about ILAMB and found it great. Does it mean getting back to xarray then? No prejudices here, just I'd like to know where to put my focus on.

Will take a look at this as soon as I can get some time. Appreciate you tracking down these issues.

If I go to xarray, it would be either using it as a backend to my implementation in which case the interface would be no different or to bring ILAMB-like syntax to xarrays. Either way, I would keep the functionality that ILAMB has.