xarray-contrib/xarray-simlab

Add access to all master clock values in runtime context

benbovy opened this issue · 9 comments

(@jeanbraun's suggestion)

This could be useful, e.g., for vectorized computation of time-dependent values at the initialize stage, e.g.,

@xs.process
class SineForcing:
    amplitude = xs.variable()
    period = xs.variable()

    rate = xs.variable(dims='time')
    
    @xs.runtime(args='master_clock_values')
    def initialize(self, time):
        self.rate = (1 + self.amplitude + self.amplitude * np.sin(1 * np.pi * time / self.period))**5

One big problem, however, is that the rate variable in the example above should ideally have the same dimension label than the master clock dimension, but the latter is undefined (it is only known when running a model). We could define a specific placeholder, but it gets complicated...

I think I have a use-case for this:
I am trying to model a plant-growth model in simlab. So at the core (oversimplified) would be:

import xsimlab as xs
import numpy as np
from math import sin, cos, tan, acos, pi, sqrt, exp, radians

@xs.process
class dBiomass:
    """calculates biomass change based on light intensity
    """
    #assuming that fraction intercepted light is 1
    eff_light = xs.variable(description='light use efficiency', default=3)
    latitude = xs.variable(default = 59)
    maxrad = xs.variable(dims='time',description='extraterrestial maximum radiation [mm/day]',intent='out')
    biomass = xs.variable(
        dims='time',intent='out',description='grown biomass'
    )
    
    @xs.runtime(args='master_clock_values')
    def initialize(self, datetime):
        #assumes the datetime to be a datetime object
        #https://stackoverflow.com/questions/620305/convert-year-month-day-to-day-of-year-in-python
        #calculate  maximum extraterrestial radiation in mm/day
        #from FAO 56 page 9
        self.latitude = radians(self.latitude)
        self._day = datetime.timetuple().tm_yday
        self._Gsc = 0.0820 #MJ/m**2/min solar constant
        self._solar_declination = 0.409*sin(2*pi/365*self._day-1.39)
        self._dr = 1+0.033*cos(2*pi/365*self.day) #inverse relative distance
        self._omegas = acos(-tan(self.latitude)*tan(delta)) #sunset hour angle
        self.maxrad=0.408*24*60/pi*self._Gsc*self._dr*(self._omegas*sin(self.latitude)*sin(self._solar_declination)+cos(self.latitude)*cos(self._solar_declination)*sin(self._omegas))
    
    @xs.runtime(args='step_delta')
    def run_step(self,dt):
        self._d_biomass = self.eff_light*self.maxrad*dt
        
    def finalize_step(self):
        self.biomass += self._d_biomass

biomass_model_very_raw = xs.Model({'Biomass':dBiomass})
# %create_setup biomass_model_very_raw --default --verbose
import xsimlab as xs
from datetime import datetime, timedelta
ds_in = xs.create_setup(
    model=biomass_model_very_raw,
    clocks={
        'time':np.arange(datetime(2010,7,1), datetime(2015,7,1), timedelta(days=1)).astype(datetime)
    },
    master_clock='time',
    input_vars={
        # light use efficiency
        'Biomass__eff_light': 3,
        # ---
        'Biomass__latitude': 59,
    },
    output_vars={}
)
ds_in
out_ds = ds_in.xsimlab.run(biomass_model_very_raw)

But I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-15-cb0a1185e7ff> in <module>
      1 ds_in
----> 2 out_ds = ds_in.xsimlab.run(biomass_model_very_raw)

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/xr_accessor.py in run(self, model, batch_dim, check_dims, validate, store, encoding, decoding, hooks, parallel, scheduler, safe_mode)
    821         )
    822 
--> 823         driver.run_model()
    824 
    825         return driver.get_results()

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in run_model(self)
    475                 *args,
    476                 parallel=self.parallel,
--> 477                 scheduler=self.scheduler,
    478             )
    479 

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in _run(dataset, model, store, hooks, validate, batch, batch_size, parallel, scheduler)
    334 
    335     try:
--> 336         model.execute("initialize", rt_context, **execute_kwargs)
    337 
    338         for step, (_, ds_step) in enumerate(ds_gby_steps):

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/model.py in execute(self, stage, runtime_context, hooks, validate, parallel, scheduler)
   1021         else:
   1022             for p_obj in self._processes.values():
-> 1023                 _, (_, signal_process) = self._execute_process(p_obj, *execute_args)
   1024 
   1025                 if signal_process == RuntimeSignal.BREAK:

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/model.py in _execute_process(self, p_obj, stage, runtime_context, hooks, validate, state)
    841 
    842         state_out, signal_out = executor.execute(
--> 843             p_obj, stage, runtime_context, state=state
    844         )
    845 

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in execute(self, p_obj, stage, runtime_context, state)
    528             return {}, RuntimeSignal.NONE
    529         else:
--> 530             signal_out = executor.execute(p_obj, runtime_context, state=state)
    531 
    532             skeys = [p_obj.__xsimlab_state_keys__[k] for k in self.out_vars]

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in execute(self, p_obj, runtime_context, state)
    390             p_obj.__xsimlab_state__ = state
    391 
--> 392         args = [runtime_context[k] for k in self.args]
    393 
    394         signal = self.meth(p_obj, *args)

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in <listcomp>(.0)
    390             p_obj.__xsimlab_state__ = state
    391 
--> 392         args = [runtime_context[k] for k in self.args]
    393 
    394         signal = self.meth(p_obj, *args)

~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in __getitem__(self, key)
     46 
     47     def __getitem__(self, key: str) -> Any:
---> 48         return self._context[key]
     49 
     50     def __setitem__(self, key: str, value: Any):

KeyError: 'master_clock_values'

A minimal example: Ideally I would like to also read precipitation data from an array that is also on the same time as the clock. I cannot find any documentation on this, I could maybe work out this example further and make a PR for the docs?

a (very pedantic) comment:
Since Github moves from master to main, (to avoid racism) maybe master clock could be renamed to main clock? Easy to do while the project is still young?

Ok. so apparently it is not implemented yet...
It is possible to access time data at runtime though: using the

@xs.runtime(args='step_start')
def run_step(self,dt,date):
    #do stuff with date here such as calculate maxrad

now there is a problem with accessing the datetime as an actual datetime :/

Ideally I would like to also read precipitation data from an array that is also on the same time as the clock.

It is possible to access time data at runtime though

You could also set time-varying input values in xs.create_setup, see #166.

now there is a problem with accessing the datetime as an actual datetime :/

Although I haven't tried it yet, I don't think anything would prevent you using datetime-friendly data types for the clock values, like you can use it with xarray.

Since Github moves from master to main, (to avoid racism) maybe master clock could be renamed to main clock? Easy to do while the project is still young?

I'm very open to that idea. We could go through some depreciation cycle to ensure smooth transition.

Ideally I would like to also read precipitation data from an array that is also on the same time as the clock.

It is possible to access time data at runtime though

You could also set time-varying input values in xs.create_setup, see #166.

This is indeed a valid workaround, but has some drawbacks:

  1. it disallows creating a setup in a single step (afaik), since a variable depends on the coordinates to create it.
  2. it is, indeed a pre-processing step, while the calculations would ideally be part of a model
    I tried that in the following code:
biomass_model_very_raw = xs.Model({'biomass':dBiomass})

ds_in = xs.create_setup(
    model=biomass_model_very_raw,
    clocks={
        'day':pd.date_range('2000-01-01',periods=365)
    },
    master_clock='day',
    input_vars={
        # light use efficiency
        'biomass__eff_light': 3,
    },
#     output_vars={'time':'biomass'}
)

def maxrad(day,latitude = 59.):
#     day = pd.to_datetime(str(date)).timetuple().tm_yday#datetime.item().timetuple().tm_yday
    latitude=radians(latitude)
    Gsc = 0.0820 #MJ/m**2/min solar constant
    solar_declination = 0.409*sin(2*pi/365*day-1.39)
    dr = 1+0.033*cos(2*pi/365*day) #inverse relative distance
    omegas = acos(-tan(latitude)*tan(solar_declination)) #sunset hour angle
    return 0.408*24*60/pi*Gsc*dr*(omegas*sin(latitude)*sin(solar_declination)+cos(latitude)*cos(solar_declination)*sin(omegas))

ds_in['biomass__maxrad']=xr.DataArray(dims='day',data=np.vectorize(maxrad)(ds_in['day.dayofyear']))
out_ds = ds_in.xsimlab.run(biomass_model_very_raw)

However, I get the error:

ValueError: Missing master clock dimension / coordinate

even though I set it in the setup.

Although I haven't tried it yet, I don't think anything would prevent you using datetime-friendly data types for the clock values, like you can use it with xarray.

So there is a problem here: datetime in xarray coordinates is datetime64[ns] format, where .astype(datetime) converts to int in stead of datetime bug report. Xarray smartness like day['time.dayofyear'] is not possible, since we get the data passed, in stead of DataArray or something.

So: all modification of datetime (at runtime) is based on numpy datetime64[ns], which ATM has horrible conversion.
e.g. conversion to datetime format,

I'm very open to that idea. We could go through some depreciation cycle to ensure smooth transition.

sounds great! I have absolutely no experience with going through deprecation cycles, however :/

another note:

Actually the only sensible place to access the entire array of time values would be in initialization step right? and then index those variables in other timesteps.

The ValueError: Missing master clock dimension / coordinate is because for some reason of the line below removes the attributes of the day coordinate (xarray-simlab relies on some of those attributes to determine the clock coordinates among the coordinates in the dataset):

ds_in['biomass__maxrad']=xr.DataArray(dims='day',data=np.vectorize(maxrad)(ds_in['day.dayofyear']))

Instead of directly assigning variables and/or values to the dataset, you should use instead xs.create_setup or Dataset.xsimlab.update_vars, e.g.,

ds_in = ds_in.xsimlab.update_vars(
    model=biomass_model_very_raw,
    input_vars={'biomass__maxrad': xr.DataArray(dims='day',data=np.vectorize(func)(ds_in['day.dayofyear']))}
)

The latter solution has the great advantage of validating (and/or converting) the new given input values so that it is consistent with the model.

Xarray smartness like day['time.dayofyear'] is not possible, since we get the data passed

This should be possible if we address #141.

all modification of datetime (at runtime) is based on numpy datetime64[ns], which ATM has horrible conversion.

Is the conversion below that horrible? (from this SO question)

@xs.process
class A:
    foo = xs.variable(default=1)
    
    @xs.runtime(args='step_start')
    def run_step(self, t):
        datetime_obj = t.astype('datetime64[s]').item()

Actually the only sensible place to access the entire array of time values would be in initialization step right? and then index those variables in other timesteps.

Yes, but one tricky thing is that we don't know the name of the clock dimension (it is set when creating a model setup). So if we want to declare an output variable with values along this dimension, we have a problem: `xs.variables(dims=???, intent='out').

In the case of using update_vars(....), the actual dimension of maxrad = xs.variable(dims=['whatever',()]) is ignored and an array with the named dim is passed. So we could just as well create a placeholder for the clock name.
Conceivably, one would want to work in a model with multple timesteps as in this paper. Tehn, the %create_setup magic command could just as well also specify time dimensions like so:

#define a clock in a process with dummy clock_dims='something' to which one can refer
@xs.process
class dBiomass:
    maxrad = xs.variable(clock_dims='day',description='maximum radiation')
   
   @xs.runtime(args='day') #or even another decorator
   def initialize(self,day):
       maxrad = 2*day

@xs.process
class OtherClock:
    some_var = xs.variable(clock_dims='other_clock',description='some other variable')
    monthly_var = xs.variable(clock_dims='monthly_clock',description='var that wants a monthly clock') 
    @xs.runtime(args='other_clock')
    def initialize(self,other_clock):
        

ds_in = xs.create_setup(
    model=biomass_model_very_raw,
    clocks={
        #clock for calculating maxrad, uses datetime format
        'day':pd.date_range('2000-01-01',periods=365),
        # clock for calculating some other variable
        'other_clock': 'day'
        #clock that wants to use its own clock for initialization
        'monthly_clock': pd.date_range('2000-01-01','2001-01-01',periods=12)
    },
    master_clock='day',
    input_vars={
        # light use efficiency
        'dBiomass__eff_light': 3,
    },
    output_vars={'dBiomass__biomass':'day'}
)

and then in the preprocessing, we could conceivably just pass the clocks in, and link to the correct clock in the case of a reference to an earlier clock, and pass that. Otherwise, maybe something like a global_clock_name, where main_clock/master_clock is a reserved name could work?

In the case of using update_vars(....), the actual dimension of maxrad = xs.variable(dims=['whatever',()]) is ignored and an array with the named dim is passed.

We could actually check the input dimension labels when creating an input dataset with xs.create_setup or xsimlab.update_vars. Currently they are validated when calling xsimlab.run(), as input datasets may have been loaded from file or created by other means.

So we could just as well create a placeholder for the clock name.

I guess it would work, yes, at least for the main clock. The current usage for other clocks is to save snapshots at different times/frequencies than the main clock. Proper support of multiple timesteps is another story.

We could actually check the input dimension labels when creating an input dataset with xs.create_setup or xsimlab.update_vars. Currently they are validated when calling xsimlab.run(), as input datasets may have been loaded from file or created by other means.

my point here was that we have the exact same problem here as with accessing the time clock: we initialize maxrad=xs.variable(dims=???) without any care about the actual dims, since these are named inside the model class. Validation when updating therefore will give a negative, because we cannot know the clock dim name beforehand. since there is only one main clock, why not enforce a name of main_clock? this question may go off-topic, since it questions quite some in-place implementation...

I guess it would work, yes, at least for the main clock. The current usage for other clocks is to save snapshots at different times/frequencies than the main clock. Proper support of multiple timesteps is another story.

I created a pull request #168 but I think it should also address #141 in order to work well, since we have to set some variables as a DataArray. The example I use is quite verbose, but I could include it as a test?

I got your point, hence the placeholder suggestion in my 1st comment, e.g., something like xs.MASTER_CLOCK (let's address master vs. main in another issue/pr) that could be used when declaring variable dimensions, e.g., xs.variable(dims=xs.MASTER_CLOCK, intent='out'), so that when creating the output dataset xarray-simlab knows it has to replace this placeholder with the actual master clock label defined by the user. Any variable declared with this placeholder should probably have static=True set too.

I prefer this option over enforcing the name of the master clock.