Signal strength, Error assumptions, and MCMC convergence
Opened this issue · 14 comments
Hi,
I'm using State-Space Bayesian Partial Pooling approach for my model and I have a few questions:
-
Does the model provide information of coefficient of determination, R2det? If it does, how can I access it?
-
When dealing with data with low metabolism signal (low diel change in DO), I found that the process error terms made the DO fluctuation even less. I have tried completely disregarding process error (setting "err_proc_iid = FALSE" in mm_name) or modified "err_proc_iid_sigma_scale" in specs, but it turned out that process errors were either 0 or still large. I'm wondering if there's any way that I can modify the standard deviation of process error (err_proc_iid_sigma?) in my partial pooling model.
-
To see if the chains in MCMC converged, I was looking for Rhat of standard deviation of K600 in the model outputs but failed to find it in get_fit (I did see Rhat for K600 mean and K600 predlog) or other functions. Does the model provide such information?
Thank you so much,
Tzu-Yao
Hi Bob,
- Does the model provide information of coefficient of determination, R2det? If it does, how can I access it?
- I just found that there is DO_R2 output from
get_fit
function,
get_fit(mm) %>% lapply(names)
,as shown in http://usgs-r.github.io/streamMetabolizer/articles/get_started.html
I'm not sure if that DO_R2 is the R2det? But regardless, I don't find such information in my model output(see below).
- When dealing with data with low metabolism signal (low diel change in DO), I found that the process error terms made the DO fluctuation even less. I have tried completely disregarding process error (setting "err_proc_iid = FALSE" in mm_name) or modified "err_proc_iid_sigma_scale" in specs, but it turned out that process errors were either 0 or still large. I'm wondering if there's any way that I can modify the standard deviation of process error (err_proc_iid_sigma?) in my partial pooling model.
- I understand that low diel change in DO is deadly for estimating gas exchange, so I'm thinking to make the signal stronger by limiting process error, because when process error is large, the signal can be even worse. I tested in my model and GPP and K600 really increased significantly after I set "err_proc_iid = FALSE" in mm_name (do not consider process error at all). However this led to positive biases (GPP and K600 gets "too large") as stated in Discussion section in Appling, Hall, Yackulic& Arroita, 2018. This is why I'm wondering if I can lower the process error without completely disregarding it. The problem is, all I could find to modify process error was "err_proc_iid_sigma_scale" in
specs
function and it had limited effect on process error in my model.
My model specifications:
model_name b_Kb_oipi_tr_plrckm.stan
engine stan
split_dates FALSE
keep_mcmcs TRUE
keep_mcmc_data TRUE
day_start 5
day_end 29
day_tests full_day, even_timesteps, complete_data, pos_discharge
required_timestep NA
K600_lnQ_nodes_centers -10, -9.8, -9.6, -9.4, -9.2, -9, -8.8, -8.6, -8.4, -8.2, -8, -7.8, -7.6, -7.4, -7.2, -7, -6.8,...
GPP_daily_mu 3.1
GPP_daily_lower -Inf
GPP_daily_sigma 6
ER_daily_mu -7.1
ER_daily_upper Inf
ER_daily_sigma 7.1
K600_lnQ_nodediffs_sdlog 0.1
K600_lnQ_nodes_meanlog 2.484906649788, 2.484906649788, 2.484906649788, 2.484906649788, 2.484906649788, 2.484906649788...
K600_lnQ_nodes_sdlog 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32, 1.32...
K600_daily_sigma_sigma 0.24
err_obs_iid_sigma_scale 0.03
err_proc_iid_sigma_scale 1e-05
params_in GPP_daily_mu, GPP_daily_lower, GPP_daily_sigma, ER_daily_mu, ER_daily_upper, ER_daily_sigma, K...
params_out GPP, ER, GPP_daily, ER_daily, K600_daily, K600_daily_predlog, lnK600_lnQ_nodes, K600_daily_sig...
n_chains 4
n_cores 4
burnin_steps 500
saved_steps 500
thin_steps 1
verbose FALSE
get_fit(mm) %>% lapply(names)
`$daily
[1] "date" "GPP_mean" "GPP_se_mean" "GPP_sd"
[5] "GPP_2.5pct" "GPP_25pct" "GPP_50pct" "GPP_75pct"
[9] "GPP_97.5pct" "GPP_n_eff" "GPP_Rhat" "ER_mean"
[13] "ER_se_mean" "ER_sd" "ER_2.5pct" "ER_25pct"
[17] "ER_50pct" "ER_75pct" "ER_97.5pct" "ER_n_eff"
[21] "ER_Rhat" "GPP_daily_mean" "GPP_daily_se_mean" "GPP_daily_sd"
[25] "GPP_daily_2.5pct" "GPP_daily_25pct" "GPP_daily_50pct" "GPP_daily_75pct"
[29] "GPP_daily_97.5pct" "GPP_daily_n_eff" "GPP_daily_Rhat" "ER_daily_mean"
[33] "ER_daily_se_mean" "ER_daily_sd" "ER_daily_2.5pct" "ER_daily_25pct"
[37] "ER_daily_50pct" "ER_daily_75pct" "ER_daily_97.5pct" "ER_daily_n_eff"
[41] "ER_daily_Rhat" "K600_daily_mean" "K600_daily_se_mean" "K600_daily_sd"
[45] "K600_daily_2.5pct" "K600_daily_25pct" "K600_daily_50pct" "K600_daily_75pct"
[49] "K600_daily_97.5pct" "K600_daily_n_eff" "K600_daily_Rhat" "K600_daily_predlog_mean"
[53] "K600_daily_predlog_se_mean" "K600_daily_predlog_sd" "K600_daily_predlog_2.5pct" "K600_daily_predlog_25pct"
[57] "K600_daily_predlog_50pct" "K600_daily_predlog_75pct" "K600_daily_predlog_97.5pct" "K600_daily_predlog_n_eff"
[61] "K600_daily_predlog_Rhat" "valid_day" "warnings" "errors"
$inst
[1] "date" "solar.time" "err_obs_iid_mean" "err_obs_iid_se_mean" "err_obs_iid_sd"
[6] "err_obs_iid_2.5pct" "err_obs_iid_25pct" "err_obs_iid_50pct" "err_obs_iid_75pct" "err_obs_iid_97.5pct"
[11] "err_obs_iid_n_eff" "err_obs_iid_Rhat" "err_proc_iid_mean" "err_proc_iid_se_mean" "err_proc_iid_sd"
[16] "err_proc_iid_2.5pct" "err_proc_iid_25pct" "err_proc_iid_50pct" "err_proc_iid_75pct" "err_proc_iid_97.5pct"
[21] "err_proc_iid_n_eff" "err_proc_iid_Rhat"
$overall
[1] "date_index" "time_index" "index" "err_obs_iid_sigma_mean"
[5] "err_obs_iid_sigma_se_mean" "err_obs_iid_sigma_sd" "err_obs_iid_sigma_2.5pct" "err_obs_iid_sigma_25pct"
[9] "err_obs_iid_sigma_50pct" "err_obs_iid_sigma_75pct" "err_obs_iid_sigma_97.5pct" "err_obs_iid_sigma_n_eff"
[13] "err_obs_iid_sigma_Rhat" "err_proc_iid_sigma_mean" "err_proc_iid_sigma_se_mean" "err_proc_iid_sigma_sd"
[17] "err_proc_iid_sigma_2.5pct" "err_proc_iid_sigma_25pct" "err_proc_iid_sigma_50pct" "err_proc_iid_sigma_75pct"
[21] "err_proc_iid_sigma_97.5pct" "err_proc_iid_sigma_n_eff" "err_proc_iid_sigma_Rhat" "lp___mean"
[25] "lp___se_mean" "lp___sd" "lp___2.5pct" "lp___25pct"
[29] "lp___50pct" "lp___75pct" "lp___97.5pct" "lp___n_eff"
[33] "lp___Rhat"
$KQ_overall
[1] "date_index" "time_index" "index" "K600_daily_sigma_mean"
[5] "K600_daily_sigma_se_mean" "K600_daily_sigma_sd" "K600_daily_sigma_2.5pct" "K600_daily_sigma_25pct"
[9] "K600_daily_sigma_50pct" "K600_daily_sigma_75pct" "K600_daily_sigma_97.5pct" "K600_daily_sigma_n_eff"
[13] "K600_daily_sigma_Rhat"
$KQ_binned
[1] "date_index" "time_index" "index" "lnK600_lnQ_nodes_mean"
[5] "lnK600_lnQ_nodes_se_mean" "lnK600_lnQ_nodes_sd" "lnK600_lnQ_nodes_2.5pct" "lnK600_lnQ_nodes_25pct"
[9] "lnK600_lnQ_nodes_50pct" "lnK600_lnQ_nodes_75pct" "lnK600_lnQ_nodes_97.5pct" "lnK600_lnQ_nodes_n_eff"
[13] "lnK600_lnQ_nodes_Rhat"
$warnings
NULL
$errors
NULL
`
Again, thanks so much,
Tzu-Yao
Hi Tzu-Yao, take out the lapply(names)
from your get_fit()
command.
I think DO_R2
may indeed be what you're looking for - check out the formula at https://github.com/USGS-R/streamMetabolizer/blob/master/inst/models/b_np_oipi_tr_psrckm.stan#L142 and note that you probably need to add it to your params_out
specification to be able to find it with get_fit()
err_proc_iid_sigma_scale
is the only/primary lever for affecting the magnitude of the fitted process errors. I'd like @robohall 's thoughts on this too, but I think we see the models appear to ignore this parameter when the other parameters that can be adjusted (e.g. GPP_daily or Pmax/alpha, ER_daily, K600_daily, and err_obs_iid_sigma) are too tightly constrained, or when there are unmodeled processes or process dynamics occurring in the ecosystem (e.g., drifting clouds or spatial heterogeneity along the reach), such that there's just no better (higher likelihood) way for the model to reproduce the shape of the observed DO curve than to call it process error.
I think
DO_R2
may indeed be what you're looking for - check out the formula at https://github.com/USGS-R/streamMetabolizer/blob/master/inst/models/b_np_oipi_tr_psrckm.stan#L142 and note that you probably need to add it to yourparams_out
specification to be able to find it withget_fit()
Hi Alison,
That's exactly what I was looking for. Thanks!
I tried to add "DO_R2" to the params_out
in specs
but only got warnings/errors and it couldn't run:
my specs after I added "DO_R2:"
bayes_specs$params_out
[1] "GPP" "ER" "DO_R2" "GPP_daily" "ER_daily"
[6] "K600_daily" "K600_daily_predlog" "lnK600_lnQ_nodes" "K600_daily_sigma" "err_obs_iid_sigma"
[11] "err_obs_iid" "err_proc_iid_sigma" "err_proc_iid"
The warnings/errors:
$warnings
[1] "some chains had errors; consider specifying chains = 1 to debug"
[2] "Stan model 'b_Kb_oipi_tr_plrckm' does not contain samples."
In Viewer it showed:
"no parameter DO_R2; sampling not done"
Thank you,
Tzu-Yao
Try updating the package with
remotes::install_github('USGS-R/streamMetabolizer')
(because DO_R2
is a relatively new parameter)
Thanks Alison,
I updated the streaMetabolizer with
remotes::install_github('appling/unitted')
remotes::install_github("USGS-R/streamMetabolizer")
and can now access DO_R2
info.
However, I realized that DO_R2
provided in the model is the R2 of DO observed relative to the modeled, instead of DO observed relative to the deterministic, R2det, as described in Appling, Hall, Yackulic& Arroita, 2018. The former takes into account both observation errors and process errors while the latter does not include process error correction and thus enable me to see how strong the signal is before corrected with process errors.
I wonder if the model also provides R2det if that makes sense?
Thank you so much,
Tzu-Yao
Ah, thanks for the reminder - amazing how much I can forget after writing a whole paper on something! The predict_DO()
function should return a data.frame with a column called DO.pure
that is the predictions without process error correction. You can then compute R2 for DO.pure
relative to DO.obs
.
Thank you Alison and Bob. Your answers/comments are really helpful!
err_proc_iid_sigma_scale
is the only/primary lever for affecting the magnitude of the fitted process errors. I'd like @robohall 's thoughts on this too, but I think we see the models appear to ignore this parameter when the other parameters that can be adjusted (e.g. GPP_daily or Pmax/alpha, ER_daily, K600_daily, and err_obs_iid_sigma) are too tightly constrained, or when there are unmodeled processes or process dynamics occurring in the ecosystem (e.g., drifting clouds or spatial heterogeneity along the reach), such that there's just no better (higher likelihood) way for the model to reproduce the shape of the observed DO curve than to call it process error.
Hi Alison, I think I'm still a little confused about the process error manipulation. Like you said, it seems that the only parameter I can adjust in the model to change process error is err_proc_iid_sigma_scale
. However I found that in Appling, Hall, Yackulic& Arroita, 2018 the experiment in 2.2.2 actually controlled the process error to be a constant and I wonder if I could do the same thing (i.e. let sigma of process error be a constant)? Thanks.
Yeah, that was what I was looking at. Thank you for the clarification!
Ah, thanks for the reminder - amazing how much I can forget after writing a whole paper on something! The
predict_DO()
function should return a data.frame with a column calledDO.pure
that is the predictions without process error correction. You can then compute R2 forDO.pure
relative toDO.obs
.
Hi @aappling-usgs, sorry it's been a while. I guess I'm still confused about this...
I used the predict_DO
function and it only outputs DO.obs
and DO.mod
. I'm not sure but it seems that DO.pure
is only provided in "sim model" when simulating data?
A part of the predict_DO
output:
date solar.time DO.obs DO.sat depth temp.water light discharge DO.mod
100 2017-07-01 2017-07-01 10:38:24 3.00 7.370137 0.8824491 31.79 763.6409 0.06887039 2.443049
101 2017-07-01 2017-07-01 11:08:24 3.34 7.337867 0.8844491 32.04 788.6277 0.07064655 2.532592
102 2017-07-01 2017-07-01 11:38:24 3.55 7.298487 0.8824491 32.35 802.4943 0.06766137 2.619626
103 2017-07-01 2017-07-01 12:08:24 3.60 7.292322 0.8824491 32.39 805.0035 0.06393055 2.702070
104 2017-07-01 2017-07-01 12:38:24 3.47 7.309420 0.8824491 32.24 796.1124 0.06732575 2.778523
105 2017-07-01 2017-07-01 13:08:24 3.67 7.286094 0.8824491 32.42 775.9731 0.06404350 2.847369
Also, while I was analyzing the errors in my model (on low signal and high noise data), I found that if I disregard process error (i.e. observation-error model, Fig. 2), the DO.mod
actually fit the DO.obs
better. In this case, can I say that observation-error model works better than the state-space model because the former is better at recognizing the DO signal? I think one way to see which is better is to look at if the process error term improves the autocorrelation of residuals (and this brings me to how I can access DO with and without process error correction)?
Fig. 2. An observation-error model
Thank you,
Tzu-Yao