stan-dev/posterior

Add Pareto-khat diagnostic

avehtari opened this issue · 10 comments

Pareto-khat diagnostic can be used in general for any Monte Carlo expectation estimate (not just importance sampling)

I'm drafting here different functions and their arguments. Some of these are based on the forthcoming PSIS paper revision

  • Copy gpdfit() from loo package (just point estimates)
  • Add new gpdpost() with different marginal and joint posteriors, see stan-dev/loo#188
  • New pareto_khat() function with support for draws objects, with arguments
    • draws
    • M=3sqrt(ndraws) or ndraws_tail=3sqrt(ndraws), number of draws in tail
    • tail='right', can be ´right', 'left', 'both', in case of both returns max k
    • r_eff=1, relative efficiency from MCMC, can be a scalar or vector value, or NULL to compute r_eff from draws

      returns
    • k
    • sigma (not probably needed, but useful for completeness)
    • minimum sample size for reliable Pareto smoothed estimate given k
    • k thresholds for reliable Pareto smoothed estimate given sample size and r_eff
    • estimated Pareto smoothed estimate RMSE convergence rate from k
    • estimated efficiency computed from k, S, and r_eff
    • diagnostic print provides informative messages, how big sample size would be needed to for reliable estimates, and how many draws would be needed to halve RMSE
  • New ps_min_ss(k) returns minimum sample size for reliable Pareto smoothed estimate
  • New ps_khat_threshold(S) function for computing khat-threshold for reliable Pareto smoothed estimate given sample size
  • New ps_convergence_rate(k, S) Function for computing the Pareto smoothed estimate RMSE convergence rate from k and S

@paul-buerkner can you help to name the functions?

pinging also @Ozan147

EDIT: edited option names
EDIT: edited names
EDIT: edited typo log -> sqrt

Draft of functions

#' Given Pareto-k computes the minimum sample size for reliable Pareto
#' smoothed estimate (to have small probability of large error)
ps_min_ss <- function(k) {
  # Eq (11) in PSIS paper
  10^(1/(1-max(0,k)))
}

#' Given sample size S computes khat threshold for reliable Pareto
#' smoothed estimate (to have small probability of large error)
ps_khat_threshold <- function(S) {
  1-1/log10(S)
}

#' Given S and scalar or array of k's, compute the relative
#' convergence rate of PSIS estimate RMSE
ps_convergence_rate <- function(k, S) {
  # allow array of k's
  rate <- numeric(length(k))
  # k<0 bounded distribution
  rate[k<0] <- 1
  # k>0 non-finite mean
  rate[k>1] <- 0
  # limit value at k=1/2
  rate[k==0.5] <- 1-1/log(S)
  # smooth approximation for the rest (see Appendix of PSIS paper)
  ki <- (k>0 & k <1 & k != 0.5)
  kk <- k[ki]
  rate[ki] = pmax(0,(2*(kk-1)*S^(2*kk+1)+(1-2*kk)*S^(2*kk)+S^2)/((S-1)*(S-S^(2*kk))))
  rate
}

#' Given S and scalar and k, form a diagnostic message string
pareto_k_diagmsg <- function(k, S) {
  msg <- paste0('With k=', round(k,2), ' and S=', round(S,0), ':\n')
  if (k > 1) {
    msg <- paste0(msg,'All estimates are unreliable. If the distribution of ratios is bounded,\nfurther draws may improve the estimates, but it is not possible to predict\nwhether any feasible sample size is sufficient.')
  } else {
    if (k > ps_khat_threshold(S)) {
      msg <- paste0(msg, 'S is too small, and sample size larger than ', round(ps_min_ss(k),0), ' is neeeded for reliable results.\n')
    } else {
      msg <- paste0(msg, 'To half the RMSE, approximately ', round(2^(2/ps_convergence_rate(k,S)),1), ' times bigger S is needed.\n')
    }
  }
  if (k > 0.7 & k < 1) {
    msg <- paste0(msg, 'Bias dominates RMSE, and the variance based MCSE is underestimated.\n')
  }
  message(msg)
}

After some discussion with @avehtari I've been working on cleaning up the functions added in @OzanStats work.

@paul-buerkner one thing to get your opinion on:
Would it make sense to aim to have this pareto_khat function which returns a number of things (k, convergence rate, min ss, etc.), be used in summarise_draws and then have multiple columns created in the summary? Or should functions generally return a single column (quantile2 being the exception)?

Both works. Could there perhaps be two functions? One that just returns khat and an extended version with diagnostics? I prefer that pareto_khat just returns the khat value. After all, we do not automatically provide more info for rhat and friends either. But then there could be another function named, say, pareto_khat_extended (please think about if you can come up with a better name), which has a multi column output as you mentioned. What do you think?

Maybe the alternative is to have a pareto_k_diagnostic_measures() like default_summary_measures()

Ok, I'll try a few options and add some example outputs here

I've now implemented a preliminary version of the tail selection and was thinking about the use cases for "right" vs "left" vs "both" when assessing Monte Carlo draws, with h(theta) = theta, and r(theta) = 1.

Specifically I was wondering whether the tails should be fit on the constrained or unconstrained draws, and if the resulting convergence rates should be consistent. (Note that currently r_eff = 1) .

For example, for the tau variable from eight schools, I get different thresholds / convergence rates for tau and log(tau), @avehtari is this expected and if so which should be used?

# first 20 draws of example_draws()
# left tail (does this even make sense for tau as it is lower-bounded?)
  variable pareto_k_l min_ss_l khat_threshold_l convergence_rate_l     S
  <chr>         <dbl>    <dbl>            <dbl>              <dbl> <dbl>
1 tau         -0.349      10              0.231              1        20
2 log_tau      0.0389     11.0            0.231              0.991    20

# right tail
  variable pareto_k_r min_ss_r khat_threshold_r convergence_rate_r     S
  <chr>         <dbl>    <dbl>            <dbl>              <dbl> <dbl>
1 tau          0.478      82.6            0.231              0.740    20
2 log_tau      0.0670     11.8            0.231              0.983    20

Specifically I was wondering whether the tails should be fit on the constrained or unconstrained draws

Should be fit on the scale that is reported

and if the resulting convergence rates should be consistent

No. E.g. the tails of log transformed variable are much shorter.

I get different thresholds / convergence rates for tau and log(tau), @avehtari is this expected and if so which should be used?

It is expected and the one that is reported should be used.

left tail (does this even make sense for tau as it is lower-bounded?)

It does make sense, and bounded tails have k<0. But usually not that interesting to report, and that's why it would be fine to report the bigger of left and right (although that will cause some overestimation)

Thanks for the answers, the use is clearer now. Then I suggest that the default be tail = "both", so when it is used in summarise_draws() for all variables, it covers bounded and unbounded variables without needing to keep track of the supports.

Then I suggest that the default be tail = "both"

I agree.