BAMresearch/bayem

"impure" ModelErrorInterface.__call__()

Closed this issue · 7 comments

In computer programming, a pure function is a function that (roughly) has no side-effects and, given the same arguments, has the same return value. The benefit is that those functions are easy to reason about, do not require much knowledge of their context and are easy to test. An example would be the obvious
error = model_error(parameter_list)

In our implementation, however, we do

model_error.parameter_list[...] = something...
error = model_error()
  • You need to know that every model error has a member parameter_list
  • Another function, like latent.update(...) here, may actually change that list without you expecting/noticing it.

Very unintuitive.

I honestly do not know why I chose the impure version. One reason could have been the distinction between the (implementation-wise) completely different (!!!) methods ModelErrorInterface.__call__ and VariationalBayesInterface.__call__.

Anyways, I'll try to revert that to the pure version.

The main issue is to think about "Who owns the parameter list?". Currently, there is exactly one parameter lists per model error and it is also stored at the model error.

A purely programmatical change would be to just keep a single parameter list per model error, but store it at the inference problem. Therefore, instead of the member self.parameter_list, the model error would need a self.generate_parameter_list() method. Or the user defines it:

def InferenceProblem.add_model_error(self, model_error, parameter_list=None):
   if parameter_list is None:
      parameter_list = model_error.generate_parameter_list()
   ...

(edit:) such that it can be called with:

me = MyCustomModelError()
prm = me.my_custom_method_to_define_the_parameter_list()
p = InferenceProblem()
p.add_model_error(me, prm)

Is anyone willing to implement that? Technically, this is a rather easy task (I think), but would provide great insights in the library.

Just a remark, maybe it would be better to initialize an empty list when no parameter list is given. IMO that is the standard case, when all parameters are latent. Otherwise, the user can generate the parameter list directly before initializing the inference problem (or in the constructor by calling the generate_parameter_list from the model error). But this does not need to be a standard interface.

Just one additional comment, I would also suggest to remove the parameter list being stored but rather provide a function that transforms the global vector into a dict of dict (for each model error a parameter list). For the generation, I would also suggest to simplify the interface such that

p.define_shared_latent_parameter_by_name("B")

is replaced by

p.define_latent_parameter("B")

that automatically assumes this is a global, shared variable and named the same in all model errors. If someone wants the rather special case of local variables, they should provide a model error and eventually the local name this global latent variable is mapped to (e.g. in case the inference guy defines E, and one forward model requires YoungsModulus and another one Emodul).

p.define_latent_parameter("B", shared=False, me=me, name_in_me="B_me")

where me could be a single me or a list.

Sounds good. The only issue is for vector valued parameters, where the dimension of "B_me" is unknown to the latent parameters. IMO that information must come from the model error, maybe in an (optionally provided) MEInterface.get_shape(self, name) that can also be used to check if B_me is a parameter of the model error in the first place.

So, unless specified otherwise via get_shape, all latent parameters are scalars, which should be a reasonable default case.

@aklawonn and I will have a pair programming session next week (with face mask and near an open window...) to try to solve this issue.

Maybe I am too focused on the current implementation and my applications, but I would state:

  • ModelError has to somehow inform the user or the library (for checks) what the needed parameters are.
  • Some of those parameters have really reasonable default values (e.g. some offset=0) that no one needs to know about, unless specifically declared as latent
  • In experiment 1, a parameter "E" could be latent. In experiment 2, it is deterministic and I want to set it directly via the model_error, not the inference problem.

Based on that, I'd say the model error must either have a parameter list as member (that the user can modify), or provide some kind of get_parameter_list() method. The latent.update(numbers) is replaced with latent_parameter_lists = latent.create_latent_parameter_lists(numbers), where latent_parameter_lists is dict{model_error_key, ParameterList}. Then, for each model error -- identified by model_error_key -- this parameter list with only the latent parameters is passed to the model error.

Within the model error, there must be something like:

class MyModelError:
# ...
    def __call__(self, latent_parameter_list):
        actual_parameter_list = join(self.parameter_list, latent_parameter_list)

In total, the model error still has a parameter list, but now there is not a single one, but three 😋

I would say in the default setting, the model error should not have a parameter list. That said, all the latent variables are passed to the model error in the call routine. If the users wants to store additional stuff in there, I don't see any reason why we should prevent this, and that joining the list can also be done. In this case, the user would have to overload the call function, but the standard model_error call method should just forward the passed parameter list to the forward model.