Bayesian Optimization for Structural Estimation

Question

Bayesian Optimization for Structural Estimation

Closed this issue a year ago · 9 comments

This article outlines Bayesian Optimization, a method of computing $x$ that optimizes $f(x)$ when $f(x)$ is hard to sample and non-differentiable.

https://towardsdatascience.com/the-beauty-of-bayesian-optimization-explained-in-simple-terms-81f3ee13b10f

It looks like this might be an improvement on our current grid-based way of doing structural estimation.

[I learned about this technique in a conversation with other colleagues about surrogate models; this accomplishes some of what surrogate models accomplish, but the algorithm is simpler and requires, I believe, somewhat less infrastructural overhead because it depends on Gaussian Processes rather than full fledged neural network architecture. ]

Answer 1 · 2023-06-22T15:02:14.000Z

I'm not sure the article really explains or justifies the method well. It begins by saying that the function is expensive to compute, but then proposes a method that samples the parameter space in a semi-disorganized way. The article also says that the derivative "isn't known", not that it's not differentiable. It's not clear whether it's meant for R^N problems and the graphics are just there as readable examples, or if it's just for optimization on R-- the author never says. Sentences like this also make me very suspicious of whether the author knows what they're talking about (emphasis added): "After a certain number of iterations, we’re **destined to arrive** at a global minima, unless the function’s shape is very bizarre..." In higher dimensional parameter spaces, the amount of queried points needed to generate an updated surrogate function would be very large. More generally, I don't think many structural papers use a grid search for their parameters. A variety of search and optimization methods are used, but a grid search would only be done in 1 or 2 dimensions as an initial check on parameters about which the econometrician knows very little.

…

On Thu, Jun 22, 2023 at 10:35 AM Sebastian Benthall < ***@***.***> wrote: cc @alanlujan91 <https://github.com/alanlujan91> This article outlines Bayesian Optimization, a method of computing $x$ that optimizes $f(x)$ when $f(x)$ is hard to sample and non-differentiable. https://towardsdatascience.com/the-beauty-of-bayesian-optimization-explained-in-simple-terms-81f3ee13b10f It looks like this might be an improvement on our current grid-based way of doing structural estimation. [I learned about this technique in a conversation with other colleagues about surrogate models; this accomplishes some of what surrogate models accomplish, but the algorithm is simpler and requires, I believe, somewhat less infrastructural overhead because it depends on Gaussian Processes rather than full fledged neural network architecture. ] — Reply to this email directly, view it on GitHub <#1291>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFO5IRKNIBJS65MSE6DXMRJZZANCNFSM6AAAAAAZQJ2RQY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2023-06-22T15:47:00.000Z

then proposes a method that samples the parameter space in a semi-disorganized way.

I disagree with this. The sampling strategy is well motivated and more or less the point.

The article also says that the derivative "isn't known", not that it's not differentiable.

Fair enough. That was perhaps my error in representing the article. Which, granted, is a readable public intro to the method not a formal description or proof of its validity.

I don't think many structural papers use a grid search for their parameters.

Is there a concrete example of a structural estimation algorithm you would point to as being superior to Bayesian Optimization?

My understanding is that DistributionOfWealthMPC, the current estimation exemplar in the HARK ecosystem, just uses a grid search.

Yesterday, we discussed how we would like HARK to logically separate several different aspects of its functionality:

Configuration Files (DARK files ?)
Model objects
Solvers (e.g. packaging generic algorithms, and also hand-crafted options)
Simulators (e.g. transition matrix, Monte Carlo)

I think we should add 'structural estimators' to this list. Depending on the model, and the computational expense of the viable solvers, and the available simulators, you would want to use different Estimators.

Bayesian Optimization strikes me as one good general Estimation technique worth including in the library. Please tell me if you have a better one in mind!

Answer 3 · 2023-06-22T16:03:13.000Z

As published, DistributionOfWealthMPC is an oddball estimation. There are two parameters, and one of them isn't free-- it's pinned down because one data feature is specified to need to match exactly (or at least to 5-ish digits of accuracy). So there's one free parameter and four moments it's trying to hit. The optimization method is *extremely* inefficient, as it performs a nested search: for every proffered value of parameter 2, it searches over values of parameter 1 to exactly match data feature A; each value of (parameter 1, parameter 2) is very costly to evaluate. IIRC, the search over parameter 1 to get data feature A to match uses Newton's method, and the outer search over parameter 2 uses something like Newton's method. There are several ways that that search could be greatly accelerated, some of which have been identified by HARK contributors. The only reason the HARK parameter search is run like this is because it was required to use the same method as legacy Mathematica code from an earlier iteration of the project. There's no mathematical, computational, or economic justification. From my own work and my understanding of what others do, structural parameter estimation is conducted by a combination of steepest descent, Newton or quasi-Newton methods, and polytope methods. If the model is (approximately) correctly specified and the objective function is well designed to identify its parameters, then the objective function should be continuous-ish and smooth-ish.

…

On Thu, Jun 22, 2023 at 11:47 AM Sebastian Benthall < ***@***.***> wrote: then proposes a method that samples the parameter space in a semi-disorganized way. I disagree with this. The sampling strategy is well motivated and more or less the point. The article also says that the derivative "isn't known", not that it's not differentiable. Fair enough. That was perhaps my error in representing the article. Which, granted, is a readable public intro to the method not a formal description or proof of its validity. I don't think many structural papers use a grid search for their parameters. Is there a concrete example of a structural estimation algorithm you would point to as being superior to Bayesian Optimization? My understanding is that DistributionOfWealthMPC, the current estimation exemplar in the HARK ecosystem, just uses a grid search. Yesterday, we discussed how we would like HARK to logically separate several different aspects of its functionality: - Configuration Files (DARK files ?) - Model objects - Solvers (e.g. packaging generic algorithms, and also hand-crafted options) - Simulators (e.g. transition matrix, Monte Carlo) I think we should add 'structural estimators' to this list. Depending on the model, and the computational expense of the viable solvers, and the available simulators, you would want to use different Estimators. Bayesian Optimization strikes me as one good general Estimation technique worth including in the library. Please tell me if you have a better one in mind! — Reply to this email directly, view it on GitHub <#1291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFI7D7O5OAG5WQ2GAOLXMRSIBANCNFSM6AAAAAAZQJ2RQY> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 4 · 2023-06-22T16:09:13.000Z

It sounds like Bayesian Optimization is exotic to economics, but that the idea that we should support Estimators is valid.
It would be interesting to compare Bayesian Optimization directly with Newton and polytope methods on different models.

Answer 5 · 2023-06-22T16:34:53.000Z

We can have different estimators/optimizers, and some of that is already built into HARK.optimization. We should get more examples of structural estimation made up as REMARKs.

…

On Thu, Jun 22, 2023 at 12:09 PM Sebastian Benthall < ***@***.***> wrote: It sounds like Bayesian Optimization is exotic to economics, but that the idea that we should support Estimators is valid. It would be interesting to compare Bayesian Optimization directly with Newton and polytope methods on different models. — Reply to this email directly, view it on GitHub <#1291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFKFNKJZJ5QBCVEKPSLXMRU3HANCNFSM6AAAAAAZQJ2RQY> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 6 · 2023-06-22T16:50:36.000Z

Hmmm.

Do you mean HARK.estimation ? (I see from the history that this was renamed from HARKestimation in 2018):
https://github.com/econ-ark/HARK/blob/master/HARK/estimation.py

It looks like these are implementations of generic optimization algorithms.
I wonder:

if/where they are used in the HARK library
how they perform relative to external implementations such as https://scikit-optimize.github.io/stable/

I think I'm suggesting something slightly different, which would be to provide a standard interface for estimating models that builds on what a Model is, so that different techniques can be directly compares. Perhaps that is too far out to be actionable at this point.

Answer 7 · 2023-06-22T17:07:46.000Z

Right, those are generic optimization algorithms. I won't rule it out, but having an "interface for estimating models" is something that was discussed early on, and sort of ruled as outside of the scope of HARK. Every particular application / research project is going to have a different set of moments / features they're trying to match, some of them pretty bespoke. What they all add up to is some mapping from parameter vector to objective function output, a scalar. That mapping can be plugged into an arbitrary optimizer-- one of the generic ones, a special handcrafted one, or something different like Bayesian optimization.

…

On Thu, Jun 22, 2023 at 12:50 PM Sebastian Benthall < ***@***.***> wrote: Hmmm. Do you mean HARK.estimation ? (I see from the history that this was renamed from HARKestimation in 2018): https://github.com/econ-ark/HARK/blob/master/HARK/estimation.py It looks like these are implementations of generic optimization algorithms. I wonder: - if/where they are used in the HARK library - how they perform relative to external implementations such as https://scikit-optimize.github.io/stable/ I think I'm suggesting something slightly different, which would be to provide a standard interface for estimating models that builds on what a Model is, so that different techniques can be directly compares. Perhaps that is too far out to be actionable at this point. — Reply to this email directly, view it on GitHub <#1291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFPAB7IFLT7LDU6PED3XMRZWPANCNFSM6AAAAAAZQJ2RQY> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 8 · 2023-06-22T17:26:05.000Z

Hmm. Interesting. I hadn't realized that this was deemed out of scope for HARK.

The other day CDC asked me to look at uses of deep surrogate models for estimation, in connection to this paper, so maybe the scope has changed.

What I've learned (fortuitously, through conversations around the SHARKFin project) is that there is more to surrogate modeling than "deep" surrogate modeling. Perhaps "deep" surrogate modeling is just the surrogate modeling technique du jour.

Here is the scikit-learn documentation for their Bayesian Optimization implementation, which may be more mathematically specific than the Medium post, and comparison of surrogate models.

https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html

https://scikit-optimize.github.io/stable/auto_examples/strategy-comparison.html

Maybe you're right that this is better demonstrated in a REMARK, DEMARK, or example rather than included in the library.

I'll close out this issue, since there's clearly higher priority stuff to do in the short term. Thanks for talking me through it.

Answer 9 · 2023-06-22T20:35:05.000Z

Seb, It's not really out of scope, but you have plenty on your plate right now and this is a much lower priority than the other things. Bookmark it for later discussion and move on.

…

On Thu, Jun 22, 2023 at 7:26 PM Sebastian Benthall ***@***.***> wrote: Hmm. Interesting. I hadn't realized that this was deemed out of scope for HARK. The other day CDC asked me to look at uses of deep surrogate models for estimation, in connection to this paper <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3782722>, so maybe the scope has changed. What I've learned (fortuitously, through conversations around the SHARKFin project) is that there is more to surrogate modeling than "deep" surrogate modeling. Perhaps "deep" surrogate modeling is just the surrogate modeling technique du jour. Here is the scikit-learn documentation for their Bayesian Optimization implementation, which may be more mathematically specific than the Medium post, and comparison of surrogate models. https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html https://scikit-optimize.github.io/stable/auto_examples/strategy-comparison.html Maybe you're right that this is better demonstrated in a REMARK, DEMARK, or example rather than included in the library. I'll close out this issue, since there's clearly higher priority stuff to do in the short term. Thanks for talking me through it. — Reply to this email directly, view it on GitHub <#1291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK76SD4EZ5L4ZKJTKBHDXMR53RANCNFSM6AAAAAAZQJ2RQY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- - Chris Carroll