Submission of bssm for Bayesian state space modelling
helske opened this issue ยท 21 comments
Reviewers:
Submitting Author: Jouni Helske (@helske)
Other Package Authors: (delete if none) Name (@mvihola)
Repository: https://github.com/helske/bssm
Version submitted: 2.0.0
Submission type: Stats
Badge grade: silver
Editor: @bbolker
Reviewers: @kingaa
Archive: TBD
Version accepted: TBD
Language: en
- Paste the full DESCRIPTION file inside a code block below:
Package: bssm
Type: Package
Title: Bayesian Inference of Non-Linear and Non-Gaussian State Space
Models
Version: 2.0.0
Authors@R:
c(person(given = "Jouni",
family = "Helske",
role = c("aut", "cre"),
email = "jouni.helske@iki.fi",
comment = c(ORCID = "0000-0001-7130-793X")),
person(given = "Matti",
family = "Vihola",
role = "aut",
comment = c(ORCID = "0000-0002-8041-7222")))
Description: Efficient methods for Bayesian inference of state space models
via particle Markov chain Monte Carlo (MCMC) and MCMC based on parallel
importance sampling type weighted estimators
(Vihola, Helske, and Franks, 2020, <doi:10.1111/sjos.12492>).
Gaussian, Poisson, binomial, negative binomial, and Gamma
observation densities and basic stochastic volatility models
with linear-Gaussian state dynamics,
as well as general non-linear Gaussian models and discretised
diffusion models are supported.
License: GPL (>= 2)
Depends: R (>= 3.5.0)
Suggests:
covr,
ggplot2 (>= 2.0.0),
KFAS (>= 1.2.1),
knitr (>= 1.11),
MASS,
rmarkdown (>= 0.8.1),
ramcmc,
sde,
sitmo,
testthat
Imports:
magrittr,
checkmate,
coda (>= 0.18-1),
diagis,
dplyr,
posterior,
Rcpp (>= 0.12.3),
rlang,
tidyr
LinkingTo: ramcmc, Rcpp, RcppArmadillo, sitmo
SystemRequirements: C++11, pandoc (>= 1.12.3, needed for vignettes)
VignetteBuilder: knitr
BugReports: https://github.com/helske/bssm/issues
URL: https://github.com/helske/bssm
ByteCompile: true
Encoding: UTF-8
NeedsCompilation: yes
RoxygenNote: 7.1.2
Roxygen: list(markdown = TRUE,
roclets = c("namespace", "rd", "srr::srr_stats_roclet"))
Pre-submission Inquiry
- A pre-submission inquiry has been approved in issue#<issue_num>
I have not made a pre-submission inquiry, but was asked to consider submitting by @mpadge and @noamross.
General Information
- Who is the target audience and what are scientific applications of this package?
State space models provide a flexible framework for statistical inference of a broad class of time series and other dynamic data. The bssm
package aims to provide easy to use and efficient functions for the Bayesian estimation of commonly used as well as more general user-defined state space models, which are usable in various application areas.
- Paste your responses to our General Standard G1.1 here, describing whether your software is:
This is the first software to implement the IS-MCMC by
Vihola, Helske, and Franks (2020) and first R package to implement delayed
acceptance pseudo-marginal MCMC for state space models. The IS-MCMC method
is also available in walker package for a
limited class of time-varying GLMss (a small subset of the models
supported by this package). Some of the functionality for exponential family
state space models is also available in KFAS, and
those models can be converted easily to bssm format for Bayesian analysis.
- (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Not applicable.
Badging
- What grade of badge are you aiming for? (bronze, silver, gold)
Silver sounds appropriate.
- If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)
The bssm
complies with a large number of standards both in the general category as well as in Bayesian Software category and their sub-categories. I see the package complying with several Time Series Software standards as well, although many of those standards do not seem to be well suited or applicable to general time series modelling via state space models and/or bssm
, so at least for now I have focused on the General and Bayesian standards.
The modelling framework and the implemented algorithms are very general, and since the early versions, the usability and features of the bssm
are greatly improved to quite general models and applications (Currently bssm
has most of the same features and many more as in the popular KFAS
package for state space modelling which has been used in various domains).
Technical checks
Confirm each of the following by checking the box.
- I/we have read the guide for authors and rOpenSci packaging guide.
- I/we have read the Statistical Software Peer Review Guide for Authors. (this link doesn't actually work)
- I/we have run
autotest
checks on the package, and ensured no tests fail.
(there are few problems for which I have opened an issue/PR in the autotest repo). - The
srr_stats_pre_submit()
function confirms this package may be submitted.
This package:
- does not violate the Terms of Service of any service it interacts with.
- has a CRAN and OSI accepted license.
- contains a README with instructions for installing the development version.
Publication options
- Do you intend for this package to go on CRAN?
This package is already on CRAN. - Do you intend for this package to go on Bioconductor?
Code of conduct
- I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
๐
The following problem was found in your submission template:
- 'author1' variable must be GitHub handle only ('@myhandle')
Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.
๐
Checks for bssm (v2.0.0)
git hash: 835eba3a
- โ๏ธ Package is already on CRAN.
- โ๏ธ has a 'CITATION' file.
- โ๏ธ has a 'codemeta.json' file.
- โ๏ธ has a 'contributing' file.
- โ๏ธ uses 'roxygen2'.
- โ๏ธ 'DESCRIPTION' has a URL field.
- โ๏ธ 'DESCRIPTION' has a BugReports field.
- โ๏ธ Package has at least one HTML vignette
- โ๏ธ All functions have examples.
- โ๏ธ Package has continuous integration checks.
- โ๏ธ Package coverage is 80.5%.
- โ๏ธ R CMD check found no errors.
- โ๏ธ R CMD check found no warnings.
Package License: GPL (>= 2)
1. rOpenSci Statistical Standards (srr
package)
This package is in the following category:
- Bayesian and Monte Carlo
โ๏ธ All applicable standards [v0.1.0.007] have been documented in this package (92 complied with; 32 N/A standards)
Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in C++ (73% in 43 files) and R (27% in 31 files)
- 2 authors
- 4 vignettes
- 5 internal data files
- 9 imported packages
- 77 exported functions (median 24 lines of code)
- 261 non-exported functions in R (median 7 lines of code)
- 291 R functions (median 29 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:
loc
= "Lines of Code"fn
= "function"exp
/not_exp
= exported / not exported
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 31 | 89.1 | |
files_src | 43 | 98.4 | |
files_vignettes | 9 | 99.0 | |
files_tests | 16 | 93.5 | |
loc_R | 3992 | 93.2 | |
loc_src | 10961 | 93.8 | |
loc_vignettes | 1452 | 95.9 | TRUE |
loc_tests | 1705 | 90.7 | |
num_vignettes | 4 | 96.0 | TRUE |
data_size_total | 1153190 | 96.2 | TRUE |
data_size_median | 2342 | 69.4 | |
n_fns_r | 338 | 93.3 | |
n_fns_r_exported | 77 | 92.7 | |
n_fns_r_not_exported | 261 | 93.3 | |
n_fns_src | 291 | 98.0 | TRUE |
n_fns_per_file_r | 6 | 69.2 | |
n_fns_per_file_src | 5 | 43.4 | |
num_params_per_fn | 4 | 67.6 | |
loc_per_fn_r | 8 | 33.9 | |
loc_per_fn_r_exp | 24 | 55.8 | |
loc_per_fn_r_not_exp | 7 | 29.9 | |
loc_per_fn_src | 29 | 86.1 | |
rel_whitespace_R | 17 | 91.9 | |
rel_whitespace_src | 15 | 98.3 | TRUE |
rel_whitespace_vignettes | 23 | 97.5 | TRUE |
rel_whitespace_tests | 22 | 96.8 | TRUE |
doclines_per_fn_exp | 78 | 83.4 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
fn_call_network_size | 1035 | 97.9 | TRUE |
2a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
3. goodpractice
and other checks
Details of goodpractice and other checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
name | conclusion | sha | date |
---|---|---|---|
R-CMD-check | 8c52ea | 2021-11-25 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following note:
- checking installed package size ... NOTE
installed size is 69.1Mb
sub-directories of 1Mb or more:
data 1.1Mb
doc 3.4Mb
libs 64.0Mb
R CMD check generated the following check_fail:
- rcmdcheck_reasonable_installed_size
Test coverage with covr
Package coverage: 80.54
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
bsm_ng | 34 |
bsm_lg | 30 |
predict.mcmc_output | 30 |
check_y | 28 |
run_mcmc.nongaussian | 25 |
as_bssm | 22 |
create_regression | 19 |
run_mcmc.ssm_nlg | 19 |
run_mcmc.ssm_sde | 19 |
check_u | 17 |
run_mcmc.lineargaussian | 16 |
summary.mcmc_output | 16 |
Static code analyses with lintr
lintr found the following 85 potential issues:
message | number of times |
---|---|
Lines should not be more than 80 characters. | 85 |
Package Versions
package | version |
---|---|
pkgstats | 0.0.3.52 |
pkgcheck | 0.0.2.149 |
srr | 0.0.1.141 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot assign @bbolker as editor
Assigned! @bbolker is now the editor
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/489_status.svg)](https://github.com/ropensci/software-review/issues/489)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
Ok thanks
@ropensci-review-bot assign @kingaa to reviewers
@kingaa added to the reviewers list. Review due date is 2022-05-27. Thanks @kingaa for accepting to review! Please refer to our reviewer guide.
Hi!
I would like to review this package
@Athene-ai I'd like to kindly remind you not to volunteer in all issues especially as you've already got one review in progress (thank you!). #523
We also tend not to ask the same people to review twice in a row.
right .. so sorry for the mistake ..
The documentation of standards is admirable.
Examining the source code, it appears to be very carefully written and in a good style. I expect it will be easy to track down bugs and to maintain the code written in this style.
Adopting the user's point of view, I attempted to plunge in. I found it more difficult to do so than I suspect the authors would like. This leads to some suggestions regarding the documentation.
First, it would be good if package?bssm
gave documentation on the package as a whole, with orientation toward the major features for the novice user. The help available from library(help=bssm)
is mnemonic but not a good introduction.
I attempted to follow the "bssm" vignette. The first example, which begins
data("nhtemp", package = "datasets")
prior <- halfnormal(1, 10)
left me wondering about the "half-normal" prior. The man page on priors is minimally informative. I do not see formulae for the prior densities, nor are there descriptions of their parametrization, nor even plots. The examples there seem mainly to be for automated-checking purposes: they shed little light for the user. Improved documentation would be helpful, as would examples of their actual usage, and some plots.
Continuing with the example, I did
bsm_model <- bsm_lg(y=nhtemp,sd_y=prior,sd_level=prior,sd_slope=prior)
mcmc_bsm <- run_mcmc(bsm_model, iter = 4e4, seed = 1)
summary(mcmc_bsm)
mcmc_bsm
plot(mcmc_bsm)
This did give some information, though the plot
method failed. Following the vignette, I was able to plot the approximate posterior densities using
mcmc_bsm |>
as.data.frame() |>
ggplot(aes(x=value,group=variable))+
geom_density()+
facet_wrap(~variable,scales="free")
However, I imagine other experienced R users, like me, would appreciate more informative outputs from the summary
and plot
methods.
Also, I immediately found myself wanting to know:
- How can MCMC convergence be diagnosed?
- How can I make a plot showing prior and posterior, perhaps on the same axes?
- How I run multiple independent chains?
I notice that, among the list of standards deemed inapplicable by the package authors are some that speak to these questions, and to the issues with understanding prior distributions which I mentioned before.
I also noticed that this vignette mentions and discusses, but does not demonstrate, nonlinear, non-Gaussian models. Since such models are a major feature of the package, some demonstration would be appreciated.
Turning to the "growth_model" vignette, I was intrigued to see that there are facilities for including snippets of C++ code. However, I was not able to follow the vignette sufficiently well as to be able to reproduce those calculations myself. I would appreciate more detailed, step-by-step instructions on how to compile the snippets shown. (For example, I get errors regarding the unknown namespace arma
).
Though there is more exploration I would like to do, I will stop here for now. It is my understanding (though I am happy to be corrected) that this review process is intended to include back-and-forth. Some comments from the authors may help me complete what I hope will be a useful review.
Thanks, @kingaa for your helpful comments. I have now updated to the package based on your suggestions, mainly by improving the documentation and adding a new plot method.
I opted to just refer to the R Journal paper and the vignette in ?bssm
instead of repeating the material there, mainly because the mathematical formulas are more readable in those. I did however expand this now a bit by noting the main functions of the package (model building functions and run_mcmc
), with some additional comments to the Nile example and pointers regarding what to do with the obtained samples from the run_mcmc
.
Good point about the prior documentation, I now added bit more details about the definititions priors, although these are fairly standard in terms of the pdfs. I also now note that the prior for the general models (e.g. ones defined via ssm_ulg
) are defined as a user-defined R function.
I also added a default plot method for the MCMC output, mimicking the classic density + trace plot style of coda
etc. The reason why there aren't many default visualizations in the package is mainly that the exact needs tend to depend on the user and the model, so in the end, in my experience users (at least myself) tend to build their custom plots manually anyway. But this kind of default plot for the hyperparameters does of course make sense. For combined plot of priors and posteriors, it is pretty difficult to construct such a default plot because if the priors are defined via user defined function (in case of say ssm_ulg
model), we can only access the joint log-prior density of the model parameters.
Regarding the summary method, I opted to give the typical details of the model in the print method, whereas summary method provides the actual summaries of the model parameters.
For the MCMC diagnostics there are some basic diagnostics available via check_diagnostics
function, and the new default plot method provides some graphical hints about the convergence. We tend to refer to the posterior
and bayesplot
packages regarding these, instead of our own re-implementations.
Regarding the multiple chains, as stated in the NA standards, this is not automatically supported, but the posterior
package provides a relatively easy way to combine the samples from multiple runs, as illustrated in ?as_draws_df.mcmc_output
.
The non-linear models are indeed not discussed in detail in the main vignette, but in the other vignettes (growth_model
and sde_model
). I'm not sure what causes the namespace error on your end, as the model seems to compile fine on my computer and on CRAN. These are a bit difficult to debug and to be honest likely a bit difficult to use for those not familiar with the C++ (and perhaps with the Armadillo library).
Sorry to jump in. My EiC rotation just started and I'm checking the status of every open issue.
I'm curious about a two things:
- 1. I see only one reviewer. @noamross are we we searching for a second reviewer or leave it at one?
- 2. @noamross I see you replaced @bbolker. Do we need to update the top comment with the latest editor?
@kingaa RE
Though there is more exploration I would like to do, I will stop here for now. It is my understanding (though I am happy to be corrected) that this review process is intended to include back-and-forth.
Typically we ask reviewers to complete their review in one go. The authors address the comments of both reviewers. Then the reviewers either (a) approve the changes or (b) request more changes.
However, before you use more of your valuable time let's see what response I get.
Since there was no response to the last query, I'm going to tag this issue as on "hold". If work resumes, we can update the tags as necessary.
@ropensci-review-bot put on hold
Submission on hold!
@ldecicco-USGS: Please review the holding status