econ-ark/HARK

separate out model definition, solution method parameters, and simulation parameters

Closed this issue · 9 comments

The parameters fed to HARK models may be:

  • part of the mathematical model definition
  • parameters for the solution algorithm (e.g. number of discretization points)
  • relevant to the simulation, but not the solution

These are currently all tossed to the AgentType class on initialization with no namespacing.

This makes it hard to, for example, export model definitions to HARK (#659), compare different solutions to the same model but with different solution algorithms (#659 (comment) ), and disentangle the solution and simulation functionality (#495 )

This is a good broad-stroke characterization. But in some cases the distinctions may be hard to make. For example, the "true" results for the model might depend on how many periods are simulated, or the process by which the aggregate shocks are drawn.

There are two possible responses in cases like this:

  1. Define all such parameters as part of the "model"
  2. Partition each of the categories mentioned above into subcategories
    • Fundamental parameters (like, relative risk aversion)
    • Parameters where there is a limiting solution but for which the specific implementation requires specific choices
      • Like, number of points in the discretization of the solution
      • Or, number of agents to simulate in the simulation

I think I'm following you but I'd like to be sure...
Could you elaborate on what "true" means in this case?

By "true", he means that we say some distribution is lognormal in the model, but we don't actually integrate a function distributed over a lognormally distributed RV. We discretize the RV by one method or another and compute a numeric integral. The limit of that method as the number of discretization points approaches infinity is the "true" function.

To be clear, are you referring to the simulation process here?

Or as part of the solution?

I think these are both covered by my original proposed categories.

Likewise, a lot of macro models assume there's a continuum of agents. When those agents are meaningfully heterogeneous, you can't actually track an infinite number of idiosyncratic states. We simulate a large-ish finite number of agents, for very small values of "large".

Of course, an analytically solved mathematical model is going to be different from a numerically approximated solution to it, or a simulation of it.

But I don't understand why this makes parameterization of a model more complicated.

By "true", he means that we say some distribution is lognormal in the model, but we don't actually integrate a function distributed over a lognormally distributed RV. We discretize the RV by one method or another and compute a numeric integral. The limit of that method as the number of discretization points approaches infinity is the "true" function.

To be clear, are you referring to the simulation process here?

Or as part of the solution?

The solution. Well, actually, both, since the shocks need to be drawn when simulating from the same process that was used when solving.

I think these are both covered by my original proposed categories.

I guess what you mean is that the "model" would say that the distribution of shocks is lognormal, and the "solution" and "simulation" parts would each separately specify the particular way in which the lognormal distribution is instantiated. Well, OK, but since the right way to do things here is to use the exact same (literally numerically identical) discretization, it seems more natural and less error-prone to allow for this choice to be made in the description of the model but to have it identified as one of the "approximation" choices rather than something that belongs to the "true" model. That would make it easier and less error-prone, for example, to redo a complete model run where the only thing you changed was the discretization method or number of points.

Likewise, a lot of macro models assume there's a continuum of agents. When those agents are meaningfully heterogeneous, you can't actually track an infinite number of idiosyncratic states. We simulate a large-ish finite number of agents, for very small values of "large".

Of course, an analytically solved mathematical model is going to be different from a numerically approximated solution to it, or a simulation of it.

But I don't understand why this makes parameterization of a model more complicated.

Not sure what you mean by this. Am guessing you are responding to Matt's comment elsewhere that it will be a substantial lift to get this done pervasively. The point is not that it would be more complicated to implement this (better) way of doing things from the outset. The point is just that revamping the existing code structure to do things this way would be a pretty substantial task.

Ok, I see what you're saying about the simulation and solution approximation parameters needing to match now. That's clear. As I approach this problem, I'll keep this in mind in the design.

As things stand now, the parameters -- meaning, mainly the data used as 'input' to the model when it is initialized -- is stored in a parameters namespace.

I wanted to add to this discussion based on the investigation done here:

#640 (comment)

Above in the discussion in this issue, @llorracc argued that it is necessary to have the approximation parameters -- i.e., the parameters determining the ways that continuous variables are discretized -- be the same for a model when it is solved, and when it is simulated.

I understand that this is conventional in economics.

Elsewhere, @llorracc has recommended separating the "true" model parameters -- which define its "infinite" or continuous form -- from those used in approximation. See #914

I think #914 is a good idea.

As we have discussed in several other contexts, we would like to decouple the idea of a model solution from the model itself. See #495 . In principle, there may be several ways of "solving", approximately, the same model. One thing HARK should support is the direct comparison of different solution methods for solving the same model.

A simple use case for this is: suppose we want to see the effects of discretization on the quality of an agent's solution procedure. To test this, one could:

  • Define an infinite model - ConsIndShock, for example
  • Compute a solution with varying discretization parameters (i.e. different counts for the income shocks)
  • Simulate the consumer behavior when exposed to continuous shocks, while they use their discretely approximated solution function.

Perhaps this is not the kind of question economists normally ask. But it is the sort of question I think one can and should be able to ask with HARK. Maybe economists should be asking it, and haven't had the tools to do so yet.

Currently, however, the solution and simulation code is deeply coupled (see #495) in the following sense: the shared approximation parameters are assumed. In fact, it is not yet possible to forward-simulate HARK models with continuous draws from random shocks. This seems like an arbitrary restriction to me.

This is a nice summary of the history of these discussions and where we stand now.

I agree that it should be possible to forward-simulate a model with a different stochastic process than the one used in the solution method.

The procedure of simulating with exactly the same stochastic process agents expected is part of the suite of assumptions encompassed in the term "rational expectations." But it can be interesting to know what outcomes look like when expectations are different from outcomes.

However, probably 99 percent of applications of models of this kind stick with the rational expectations approach. And even the remaining 1 percent always compute the rational expectations solution so that they can see what difference their proposed deviation.

Most people have probably done things the way you want to: Writing the code for taking expectations and for simulating populations completely independently. One reason I have been focused on integrating the two as tightly as possible is that in my experience it often turns out that it is extremely difficult to distinguish between an actual bug in one of the two (your simulation or your solution algorithm) and results from the two stages that differ only because of almost-meaningless details of the specification of the stochastic process in the two stages. To the extent possible, having a single specification that is shared between the forward and backward numerical procedures minimizes this particular source of error.

Hence, I'd argue that the natural approach is for the code to default to the simulation of the same process used for solution. But I'm in agreement that it should be obvious to a user how to substitute some other simulation procedure for the default one. This means that we do need to make the description of the statistical processes modular and plug-and-play, which I think is the thrust of your point.