OpenSourceEconomics/respy

interface revision

peisenha opened this issue · 4 comments

Description

We are considering to revise the respy interface and this issue serves to collect thought and use cases. At the moment, we initialize a simulate() and a crit_func() and do a host of setup operations. In particular, we create the time-consuming StateSpace class instance.

For simulation:

simulate = rp.get_simulate_func(params, options)
df = simulate(params)

For estimation:

crit_func = rp.get_crit_func(params, options, df)
crit_func(params)

The purpose of the package is to serve as a computational sandbox. Some features of the current interface make this harder than it probably should be. However, I might just miss the proper workflow at this point, so any clarification welcome (@tobiasraabe ).

  • I am running a bootstrap. Each time I sample a new dataset, I need to initialize a new criterion function which entails the costly creation of the StateSpace class even throughout that remains unchanged throughout the exercise.

  • I am investigating the effect of numerical tuning parameters on the shape of the likelihood function. I iterate over different numbers of Monte Carlo draws by changing the options file. Again, I need to create the criterion function. This might be relevant for your notebook as well, @rafaelsuchy .

I think for a bootstrap it does not really matter because the setup cost is really small compared to the cost of running a bootstrap. But in general it is true, that we should save setup costs.

The reason we implemented it like this is that it reduces complexity by a lot when you always re-create everything instead taking some old model instance, determining which parts have to change and re-creating them.

Therefore, I suggest the following:

  • we first try to just reduce the setup costs before we try to reduce the number of times we incur it. If it is mainly the StateSpace creation, there are definitely ways to make it faster.
  • If this is not enough, we try to cache the most expensive functions. We should not manually check what has to be re-created, but use an existing solution like joblib memcache.
  • Only if this is still not enough I would consider re-using instances of some model class.

Points for our discussion on Thursday:

  • Yes, the setup costs are small for a serious bootstrap exercise but are "sizeable" during prototyping that only involves a small number of function evaluations for testing purposes.

  • The caching solution looks interesting, we might just need to look for one that does not require to write to disk depending on how large a dump of the StateSpace instance is.

  • Also, I would like us to consider/discuss in our next call if we want the respy interfaces work with dataframe that have Individual and Period as a pd.MulitIndex.

The multiindex is already implemented in #277 and will be merged into master at some point.