Vignette for large/complex model workflow
Opened this issue ยท 9 comments
Main ideas:
- Can start with ADVI/optimization
- Can (sometimes) use reduced data/model (but then reduced validity, still better than nothing)
- I.e. just one random effect, just three states in a HMM, ...
- How to learn something even from a single fit? (will need some additional code/support)
- Merge all the ranks together
- z-scores (or equivalent)
I would be excited to push this forward if you think one of the following (preferably their intersection) can be considered as complex model from your point of view. Links are related issues for each module.
@Dashadower will first attack hierarchical and @hyunjimoon will attack high dimension.
Building on @martinmodrak's small workflow, @tomfid and I will make a document for Bayesian workflow on ODE model for urban dynamics (mdl file).
This satisfies at least two categories above (ODE, high dimension). Moreover, Vicky's research on urban scaling, regressing the log of city index such as congestion or income inequality with log of city population opens up the path toward hierarchical Bayesian modeling. In short, hierarchical Bayesian start from viewing
Considering priordb
project where realistic values of parameter that could be assumed are collectively learned.
We will include:
Must
- Classifying parameters into three: assumed parameter, assumed parameter time-series, estimated parameter.
- Justification of prior specification and its automation
- Three checks: prior predictive, posterior predictive, simulation-based calibration
For 2, sections "Multiplicative error and the lognormal distribution, Weakly informative priors, Priors for system parameters and noise scale" from this case study on population dynamics is a good place to start for setting distribution and parameter for prior. This corresponds to "Specify_implicit" (H5.abc) from this Human-Machine collaboration table (HMC table).
We are consider including:
Option
- Demand prior elicitation to policy function and its optimization for policy prescription
- Comparing different
posterior approximator
modules (MCMC, variational inference, optimization)
For 4, translating Vensim's .vpd
to Stan model block is the key as then we can use its optimization engine like this restaurant revenue optimization example.
For 5, the aim is to find the cheaper(-est) computation that reaches conditioned precision (step 9 from HMC table).
Tool
stan_builder
I am developing with @Dashadower on @JamesPHoughton's Pysd (currently on stan-backend branch pull-requested here)- One example on prey-predator here which includes 1,2,3 for above.
- More complicated example is on stock management here with at least ten assumed parameters (will be updated by 9/5)
Ref
- Workflow sequence conditional on data and model by Mike Betancourt here
@jandraor and I am trying this with three example models in Data4DM/BayesSD#76. @tomfid's help, especially regarding inferencedata is helpful as vensim supporting this format would be crucial in connecting Vensim subscript with hierarchical Bayesian.
Also, @OriolAbril and @ahartikainen are helping connecting this to arviz. Thanks!
@martinmodrak @Dashadower, could current SBC R library's output be easily transformed to inferencedata by any chance? Or would there be any reference codes we can refer to e.g. previous attempts of our community to connect posterior
and arviz
? @jandraor and I are using different language (R, Python) and wondered whether we can pool our efforts in plots by having a modularized data structure.
this issue stan-dev/posterior#85 sounds relevant to interoperability
Below is rough plan which I felt needed for large model workflow. Enjoyable milestone is Bayesian workflow dynamic model casestudy on prey-predator, SEIR, inventory management by around March, 2023. Thank you very much, all!
Goal: bridging Vensim ecosystem with Stan ecosystem
- provide efficient (gradient-based) and effective (diagnostics) HMC-estimator to dynamic model (generator)
- consistency among data(
.nc
), model (.stan
), plots (.png
) - template for simulation-based calibration checks e.g. this python file
For this, I am trying to
-
connect stanify with Dynamic simulation scenario 1,2 (with @tomfid, @enekomartinmartinez, @tseyanglim, @JamesPHoughton's support)
-
1's result by putting many
.nc
files into onesbc.nc
(with @Dashadower, @OriolAbril, @ahartikainen's support) -
connect
.nc
output with SBC package viarvar
concept (with @paul-buerkner, @martinmodrak, @jandraor's support)
Dynamic simulation scenario (Vensim)
outputs netcdf format (.nc
). Scenarios to reach .nc
.
scenario 1) Vensim/Stella user on Python
- pysd translates
.mdl
,.xml
to python objects (support Stella, which stanify lacks now) - pysd can output synthetic data in
generator.nc
- @enekomartinmartinez, @JamesPHoughton maintains pysd
scenario 2) Pure Vensim user
- Vensim
.mdl
is considering supporting.nc
format, if this happens, it can outputgenerator.nc
andestimator.nc
- Vensim has internal MCMC, but wish to connect to HMC
- @tomfid is Vensim CTO and @tseyanglim has expertise in connecting Vensim hierarchical model with python
stanify(scenario 1 or 2)
- stanify translates
.mdl
to.stan
and outputs onegenerator.nc
andestimator.nc
for baseline case (no hierarchy, no prior_draw's') - stanify outputs one
generator.nc
and three (n_prior_draws) number ofestimator.nc
for SBC - stanify outputs one
generator.nc
and two (n_subgroups) number ofestimator.nc
for hierarchical model
- @Dashadower is SBC developer, @OriolAbril, @ahartikainen are arviz developer
- arviz's wrapper or multi-index for posterior group can streamline 2,3 which can be added on @OriolAbril's sbc-cmdstanpy
Computational statistician (Stan)
transform .nc
to rvars
which SBC package supports. Three verifications needed:
- aria code translates
.nc
torvars
here (@mike-lawrence, could you confirm this?) - SBC supports
rvars
here (@martinmodrak, could you confirm this?) rvar
-based Bayes visualization + empirical coverage plot is explainable enough for policy specification on dynamic model (@jandraor, could you confirm this?). Matthew's rvars explains howrvar
can make visualization easy by grouping random variable which may be relevant to inferencedata's data variable concept.