morphoorg/morpho

Processors Facilitating Stan Ensemble Runs and Sensitivity Analysis

Opened this issue · 0 comments

Running batch/ensemble analyses should happen in two steps (calls to morpho):

  1. Prepare and analyze data (either real or pseudo), producing posteriors;
  2. Compute and present metrics that depend on results of the entire ensemble of runs.

For Step 1, a processor should have the following capabilities:

  • Calls another processor which randomly selects input values to the data generator from priors, and saves the inputs in the appropriate file - to facilitate computing coverages, etc. (This should be an automatic part of all sensitivity analyses). The processor could be based on morpho/morpho/preprocessing/sample_inputs.py in morpho1.
    • If necessary, incorporate an option to compute transformed inputs from sampled inputs, before generation.
  • Optionally resets Stan initialization and sampling parameters (like iter) based on inputs sampled from priors. This is possible in the spectrum_analysis branch of morpho1.
  • Generates (or loads) and analyzes a specified number of data sets. Could be configured for parallel processing of analysis runs on a cluster, as is the case in scripts/morpho_models/python_scripts/ensemble_runs.py (which I am cleaning up at the moment).

For Step 2, we should have the following features (processors):

  • Compute and report posterior credible intervals and coverages.
  • Plot posterior means and credible intervals as a function of inputs for a given parameter. This is often the simplest way to summarize the results of Bayesian inference.
  • Compute posterior shrinkages (S) and z-scores (Z); plot S vs. Z for a given parameter. These plots allow one to diagnose overfitting, posterior/prior conflicts, poor model specification, and good model behavior.