Submission of bssm for Bayesian state space modelling

Question

Submission of bssm for Bayesian state space modelling

helske opened this issue 3 years ago · 21 comments

Reviewers:
Submitting Author: Jouni Helske (@helske)
Other Package Authors: (delete if none) Name (@mvihola)
Repository: https://github.com/helske/bssm
Version submitted: 2.0.0
Submission type: Stats
Badge grade: silver
Editor: @bbolker
Reviewers: @kingaa

Due date for @kingaa: 2022-05-27

Archive: TBD
Version accepted: TBD
Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: bssm
Type: Package
Title: Bayesian Inference of Non-Linear and Non-Gaussian State Space
        Models
Version: 2.0.0
Authors@R: 
    c(person(given = "Jouni",
           family = "Helske",
           role = c("aut", "cre"),
           email = "jouni.helske@iki.fi",
           comment = c(ORCID = "0000-0001-7130-793X")),
      person(given = "Matti",
           family = "Vihola",
           role = "aut",
           comment = c(ORCID = "0000-0002-8041-7222")))
Description: Efficient methods for Bayesian inference of state space models 
    via particle Markov chain Monte Carlo (MCMC) and MCMC based on parallel 
    importance sampling type weighted estimators 
    (Vihola, Helske, and Franks, 2020, <doi:10.1111/sjos.12492>). 
    Gaussian, Poisson, binomial, negative binomial, and Gamma
    observation densities and basic stochastic volatility models 
    with linear-Gaussian state dynamics, 
    as well as general non-linear Gaussian models and discretised 
    diffusion models are supported.
License: GPL (>= 2)
Depends: R (>= 3.5.0)
Suggests: 
    covr,
    ggplot2 (>= 2.0.0),
    KFAS (>= 1.2.1),
    knitr (>= 1.11),
    MASS,
    rmarkdown (>= 0.8.1),
    ramcmc,
    sde,
    sitmo,
    testthat
Imports: 
    magrittr,
    checkmate,
    coda (>= 0.18-1),
    diagis,
    dplyr,
    posterior,
    Rcpp (>= 0.12.3),
    rlang,
    tidyr
LinkingTo: ramcmc, Rcpp, RcppArmadillo, sitmo
SystemRequirements: C++11, pandoc (>= 1.12.3, needed for vignettes)
VignetteBuilder: knitr
BugReports: https://github.com/helske/bssm/issues
URL: https://github.com/helske/bssm
ByteCompile: true
Encoding: UTF-8
NeedsCompilation: yes
RoxygenNote: 7.1.2
Roxygen: list(markdown = TRUE, 
  roclets = c("namespace", "rd", "srr::srr_stats_roclet"))

Pre-submission Inquiry

A pre-submission inquiry has been approved in issue#<issue_num>
I have not made a pre-submission inquiry, but was asked to consider submitting by @mpadge and @noamross.

General Information

Who is the target audience and what are scientific applications of this package?

State space models provide a flexible framework for statistical inference of a broad class of time series and other dynamic data. The bssm package aims to provide easy to use and efficient functions for the Bayesian estimation of commonly used as well as more general user-defined state space models, which are usable in various application areas.

Paste your responses to our General Standard G1.1 here, describing whether your software is:

This is the first software to implement the IS-MCMC by
Vihola, Helske, and Franks (2020) and first R package to implement delayed
acceptance pseudo-marginal MCMC for state space models. The IS-MCMC method
is also available in walker package for a
limited class of time-varying GLMss (a small subset of the models
supported by this package). Some of the functionality for exponential family
state space models is also available in KFAS, and
those models can be converted easily to bssm format for Bayesian analysis.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Not applicable.

Badging

What grade of badge are you aiming for? (bronze, silver, gold)

Silver sounds appropriate.

If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)

The bssm complies with a large number of standards both in the general category as well as in Bayesian Software category and their sub-categories. I see the package complying with several Time Series Software standards as well, although many of those standards do not seem to be well suited or applicable to general time series modelling via state space models and/or bssm, so at least for now I have focused on the General and Bayesian standards.

The modelling framework and the implemented algorithms are very general, and since the early versions, the usability and features of the bssm are greatly improved to quite general models and applications (Currently bssm has most of the same features and many more as in the popular KFAS package for state space modelling which has been used in various domains).

Technical checks

Confirm each of the following by checking the box.

I/we have read the guide for authors and rOpenSci packaging guide.
I/we have read the Statistical Software Peer Review Guide for Authors. (this link doesn't actually work)
I/we have run autotest checks on the package, and ensured no tests fail.
(there are few problems for which I have opened an issue/PR in the autotest repo).
The srr_stats_pre_submit() function confirms this package may be submitted.

This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.

Publication options

Do you intend for this package to go on CRAN?
This package is already on CRAN.
Do you intend for this package to go on Bioconductor?

Code of conduct

I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Athene-ai commented 3 years ago

Ok thanks

👍1

Answer 1 · 2021-11-25T19:10:42.000Z

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Answer 2 · 2021-11-25T19:10:45.000Z

🚀

The following problem was found in your submission template:

'author1' variable must be GitHub handle only ('@myhandle')
Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

👋

Answer 3 · 2021-11-25T22:30:24.000Z

Checks for bssm (v2.0.0)

git hash: 835eba3a

✔️ Package is already on CRAN.
✔️ has a 'CITATION' file.
✔️ has a 'codemeta.json' file.
✔️ has a 'contributing' file.
✔️ uses 'roxygen2'.
✔️ 'DESCRIPTION' has a URL field.
✔️ 'DESCRIPTION' has a BugReports field.
✔️ Package has at least one HTML vignette
✔️ All functions have examples.
✔️ Package has continuous integration checks.
✔️ Package coverage is 80.5%.
✔️ R CMD check found no errors.
✔️ R CMD check found no warnings.

Package License: GPL (>= 2)

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Bayesian and Monte Carlo

✔️ All applicable standards [v0.1.0.007] have been documented in this package (92 complied with; 32 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in C++ (73% in 43 files) and R (27% in 31 files)
2 authors
4 vignettes
5 internal data files
9 imported packages
77 exported functions (median 24 lines of code)
261 non-exported functions in R (median 7 lines of code)
291 R functions (median 29 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	31	89.1
files_src	43	98.4
files_vignettes	9	99.0
files_tests	16	93.5
loc_R	3992	93.2
loc_src	10961	93.8
loc_vignettes	1452	95.9	TRUE
loc_tests	1705	90.7
num_vignettes	4	96.0	TRUE
data_size_total	1153190	96.2	TRUE
data_size_median	2342	69.4
n_fns_r	338	93.3
n_fns_r_exported	77	92.7
n_fns_r_not_exported	261	93.3
n_fns_src	291	98.0	TRUE
n_fns_per_file_r	6	69.2
n_fns_per_file_src	5	43.4
num_params_per_fn	4	67.6
loc_per_fn_r	8	33.9
loc_per_fn_r_exp	24	55.8
loc_per_fn_r_not_exp	7	29.9
loc_per_fn_src	29	86.1
rel_whitespace_R	17	91.9
rel_whitespace_src	15	98.3	TRUE
rel_whitespace_vignettes	23	97.5	TRUE
rel_whitespace_tests	22	96.8	TRUE
doclines_per_fn_exp	78	83.4
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	1035	97.9	TRUE

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

name	conclusion	sha	date
R-CMD-check		8c52ea	2021-11-25

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following note:

checking installed package size ... NOTE
installed size is 69.1Mb
sub-directories of 1Mb or more:
data 1.1Mb
doc 3.4Mb
libs 64.0Mb

R CMD check generated the following check_fail:

rcmdcheck_reasonable_installed_size

Test coverage with covr

Package coverage: 80.54

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function	cyclocomplexity
bsm_ng	34
bsm_lg	30
predict.mcmc_output	30
check_y	28
run_mcmc.nongaussian	25
as_bssm	22
create_regression	19
run_mcmc.ssm_nlg	19
run_mcmc.ssm_sde	19
check_u	17
run_mcmc.lineargaussian	16
summary.mcmc_output	16

Static code analyses with lintr

lintr found the following 85 potential issues:

message	number of times
Lines should not be more than 80 characters.	85

Package Versions

package	version
pkgstats	0.0.3.52
pkgcheck	0.0.2.149
srr	0.0.1.141

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

Answer 4 · 2021-12-02T15:02:42.000Z

@ropensci-review-bot assign @bbolker as editor

Answer 5 · 2021-12-02T15:02:45.000Z

Assigned! @bbolker is now the editor

Answer 6 · 2022-05-05T21:01:12.000Z

@ropensci-review-bot seeking reviewers

Answer 7 · 2022-05-05T21:01:13.000Z

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/489_status.svg)](https://github.com/ropensci/software-review/issues/489)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

Answer 8 · 2022-05-06T13:58:37.000Z

@ropensci-review-bot assign @kingaa to reviewers

Answer 9 · 2022-05-06T13:58:39.000Z

@kingaa added to the reviewers list. Review due date is 2022-05-27. Thanks @kingaa for accepting to review! Please refer to our reviewer guide.

Answer 10 · 2022-05-06T13:58:41.000Z

@kingaa: If you haven't done so, please fill this form for us to update our reviewers records.

Answer 11 · 2022-05-20T08:15:12.000Z

Hi!

I would like to review this package

Answer 12 · 2022-05-20T08:24:20.000Z

@Athene-ai I'd like to kindly remind you not to volunteer in all issues especially as you've already got one review in progress (thank you!). #523

We also tend not to ask the same people to review twice in a row.

Answer 13 · 2022-05-20T08:28:30.000Z

right .. so sorry for the mistake ..

Answer 14 · 2022-06-25T16:18:40.000Z

The documentation of standards is admirable.

Examining the source code, it appears to be very carefully written and in a good style. I expect it will be easy to track down bugs and to maintain the code written in this style.

Adopting the user's point of view, I attempted to plunge in. I found it more difficult to do so than I suspect the authors would like. This leads to some suggestions regarding the documentation.

First, it would be good if package?bssm gave documentation on the package as a whole, with orientation toward the major features for the novice user. The help available from library(help=bssm) is mnemonic but not a good introduction.

I attempted to follow the "bssm" vignette. The first example, which begins

data("nhtemp", package = "datasets")
prior <- halfnormal(1, 10)

left me wondering about the "half-normal" prior. The man page on priors is minimally informative. I do not see formulae for the prior densities, nor are there descriptions of their parametrization, nor even plots. The examples there seem mainly to be for automated-checking purposes: they shed little light for the user. Improved documentation would be helpful, as would examples of their actual usage, and some plots.

Continuing with the example, I did

bsm_model <- bsm_lg(y=nhtemp,sd_y=prior,sd_level=prior,sd_slope=prior)
mcmc_bsm <- run_mcmc(bsm_model, iter = 4e4, seed = 1)
summary(mcmc_bsm)
mcmc_bsm
plot(mcmc_bsm)

This did give some information, though the plot method failed. Following the vignette, I was able to plot the approximate posterior densities using

mcmc_bsm |> 
  as.data.frame() |> 
  ggplot(aes(x=value,group=variable))+
  geom_density()+
  facet_wrap(~variable,scales="free")

However, I imagine other experienced R users, like me, would appreciate more informative outputs from the summary and plot methods.

Also, I immediately found myself wanting to know:

How can MCMC convergence be diagnosed?
How can I make a plot showing prior and posterior, perhaps on the same axes?
How I run multiple independent chains?

I notice that, among the list of standards deemed inapplicable by the package authors are some that speak to these questions, and to the issues with understanding prior distributions which I mentioned before.

I also noticed that this vignette mentions and discusses, but does not demonstrate, nonlinear, non-Gaussian models. Since such models are a major feature of the package, some demonstration would be appreciated.

Turning to the "growth_model" vignette, I was intrigued to see that there are facilities for including snippets of C++ code. However, I was not able to follow the vignette sufficiently well as to be able to reproduce those calculations myself. I would appreciate more detailed, step-by-step instructions on how to compile the snippets shown. (For example, I get errors regarding the unknown namespace arma).

Though there is more exploration I would like to do, I will stop here for now. It is my understanding (though I am happy to be corrected) that this review process is intended to include back-and-forth. Some comments from the authors may help me complete what I hope will be a useful review.

Answer 15 · 2022-07-06T08:52:29.000Z

Thanks, @kingaa for your helpful comments. I have now updated to the package based on your suggestions, mainly by improving the documentation and adding a new plot method.

I opted to just refer to the R Journal paper and the vignette in ?bssm instead of repeating the material there, mainly because the mathematical formulas are more readable in those. I did however expand this now a bit by noting the main functions of the package (model building functions and run_mcmc), with some additional comments to the Nile example and pointers regarding what to do with the obtained samples from the run_mcmc.

Good point about the prior documentation, I now added bit more details about the definititions priors, although these are fairly standard in terms of the pdfs. I also now note that the prior for the general models (e.g. ones defined via ssm_ulg) are defined as a user-defined R function.

I also added a default plot method for the MCMC output, mimicking the classic density + trace plot style of coda etc. The reason why there aren't many default visualizations in the package is mainly that the exact needs tend to depend on the user and the model, so in the end, in my experience users (at least myself) tend to build their custom plots manually anyway. But this kind of default plot for the hyperparameters does of course make sense. For combined plot of priors and posteriors, it is pretty difficult to construct such a default plot because if the priors are defined via user defined function (in case of say ssm_ulg model), we can only access the joint log-prior density of the model parameters.

Regarding the summary method, I opted to give the typical details of the model in the print method, whereas summary method provides the actual summaries of the model parameters.

For the MCMC diagnostics there are some basic diagnostics available via check_diagnostics function, and the new default plot method provides some graphical hints about the convergence. We tend to refer to the posterior and bayesplot packages regarding these, instead of our own re-implementations.

Regarding the multiple chains, as stated in the NA standards, this is not automatically supported, but the posterior package provides a relatively easy way to combine the samples from multiple runs, as illustrated in ?as_draws_df.mcmc_output.

The non-linear models are indeed not discussed in detail in the main vignette, but in the other vignettes (growth_model and sde_model). I'm not sure what causes the namespace error on your end, as the model seems to compile fine on my computer and on CRAN. These are a bit difficult to debug and to be honest likely a bit difficult to use for those not familiar with the C++ (and perhaps with the Armadillo library).

Answer 16 · 2023-02-03T19:19:40.000Z

Sorry to jump in. My EiC rotation just started and I'm checking the status of every open issue.

I'm curious about a two things:

1. I see only one reviewer. @noamross are we we searching for a second reviewer or leave it at one?
2. @noamross I see you replaced @bbolker. Do we need to update the top comment with the latest editor?

@kingaa RE

Though there is more exploration I would like to do, I will stop here for now. It is my understanding (though I am happy to be corrected) that this review process is intended to include back-and-forth.

Typically we ask reviewers to complete their review in one go. The authors address the comments of both reviewers. Then the reviewers either (a) approve the changes or (b) request more changes.

However, before you use more of your valuable time let's see what response I get.

Answer 17 · 2024-02-28T15:43:17.000Z

Since there was no response to the last query, I'm going to tag this issue as on "hold". If work resumes, we can update the tags as necessary.

Answer 18 · 2024-02-28T16:42:27.000Z

@ropensci-review-bot put on hold

Answer 19 · 2024-02-28T16:42:30.000Z

Submission on hold!

Answer 20 · 2024-05-28T16:42:33.000Z

@ldecicco-USGS: Please review the holding status

Archive: TBD Version accepted: TBD Language: en