ecmwf/earthkit-meteo

How to call the computations with other objects than numpy arrays

Opened this issue · 1 comments

Currently, all the methods can only be called with numpy arrays as input/output. This issue is dedicated to finding a way to call these methods with other objects like earhkit-data fieldlist, xarray etc...

Some initial thoughts

As a result of internal discussions, one idea is to have two versions for each method:

  • one that takes only ndarrays as an input and located in an array submodule (in the future these methods may accept other array formats)
  • a higher level one that can take other objects, internally calling the array version

Now, the requirement is that the higher level methods should be able to:

  1. extract the right ndarray(s) from the input
  2. pass it to the array level routine
  3. create and return an object with the same type as the input storing the resulting data

To tackle task 1 and 2 we need to give instructions on how to interpret the arguments of the array version methods. We investigate 3 different types of argument lists below.

cos_solar_zenith_angle()

"array" version would be in earthkit.meteo.solar.arrray:

def cos_solar_zenith_angle(date, latitudes, longitudes)

"high level" version would be in earthkit.meteo.solar:

def cos_solar_zenith_angle(*args, **kwargs)

We should be able to call the high level version as follows:

from earthkit.meteo.solar import cos_solar_zenith_angle

# ds is:
# - earthkit-data  fieldlist
# - earthkit-data  tensor
# - xarray
# - pandas DataFrame with date, lat and lon columns
res = cos_solar_zenith_angle(ds)

# the arguments can be the same as above, on top of that
# each can be a number (my_date a datetime.datetime) or an ndarray
res = cos_solar_zenith_angle(my_date, my_lat, my_lon)

# my_date is a datetime.dateime and ds is the same as in the first case
res = cos_solar_zenith_angle(my_date, ds)

As a generic solution, the high level methods could be implemented with a decorator, e.g.:

from earthkit.meteo.solar  import array

@param_args("date", "latitude", "longitude") 
def cos_solar_zenith_angle(*args, **kwargs):
    return arrray.cos_solar_zenith_angle(*args, **kwargs)

where @param_args would describe each argument of the low level "array" method. A very simple prototype using what is available now in earthkit-data/develop cab be seen in the notebook below. Please ignore the implementation, the main focus is on the actual usage of the decorator.

https://github.com/ecmwf/earthkit-meteo/blob/feature/call-with-objects/docs/experimental/compute_object.ipynb

The example does not create the required output object just returns the result of the array method.

cos_solar_zenith_angle_integrated()

Method cos_solar_zenith_angle_integrated has some positional and keyword arguments which are specified by the user and not extracted from the input: begin_date, end_date, intervals_per_hour, integration_order

def cos_solar_zenith_angle_integrated(
    begin_date,
    end_date,
    latitudes,
    longitudes,
    *,
    intervals_per_hour=1,
    integration_order=3)

The decorator could be written as:

from earthkit.meteo.solar  import array

@param_args(None, None, "latitude", "longitude") 
def cos_solar_zenith_angle_integrated(*args, **kwargs):
    return array.cos_solar_zenith_angle_integrated(*args, **kwargs)

where None indicates that the given positional argument should be passed straight to the array method. We could call the method like that (assuming that all the kwargs are passed straight to the array method):

# my_start_date, my_end_date are  datetime.datetime, 

# ds is an earthkit-data  fieldlist
res =  cos_solar_zenith_angle_integrated(my_start_date, my_end_date, ds[0])

# my_lat and my_lon are numbers or arrays
res =  cos_solar_zenith_angle_integrated(my_start_date, my_end_date, my_lat, my_lon)

potential_temperature ()

Let us look at this "array" method (yet to be added to earthkit-meteo):

# the array version
def _potential_temperature(t, p):
    # t: temperature in K
    # p: pressure in Pa
    return t*(100000./p)**0.285611

The challenge in writing the decorator is that "p" should only be extracted for the fields containing "t". However, our very simple decorator approach cannot declare it. E.g. imagine we have (see the notebook for the full example):

@param_args("t", "p") 
def potential_temperature(*args, **kwargs):
    res = _potential_temperature(*args, **kwargs)
    return res

If ds is a fieldlist with 18 pressure level fields out of which only 6 are "t" then calling:

res = potential_temperature(ds)

will fail because "t" will be extracted for 6 fields while "p" for 18 fields.

One possible solution could be to indicate the relation between "t" and "p" with the following syntax:

@param_args("t", "p(t)") 
def potential_temperature(*args, **kwargs):
    res = _potential_temperature(*args, **kwargs)
    return res

We also need to consider that pressure can also be specified as separate fields (e.g. for height level data) or has to be computed (e.g. for model level data from a specified 'lnsp' field).

Creating the output object

Let us focus on the case when the input object is a GRIB fieldlist. To create the resulting object we need to provide information to generate the GRIB message. E.g.

from earthkit.meteo.solar  import array

@param_args("date", "latitude", "longitude", output="cos_solar_zenith_angle") 
def cos_solar_zenith_angle(*args, **kwargs):
    return array.cos_solar_zenith_angle(*args, **kwargs)

and the decorator should be clever enough to figure out that on the resulting grib message it has to set "paramId=214001".

Question of iteration

For large input data that does not fit into the memory we want to call the computations in a loop. It is not clear how this loop should be implemented when calling earthkit.meteo methods.