/fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R

Primary LanguageRGNU General Public License v3.0GPL-3.0

fwildclusterboot

Lifecycle: experimental CRAN status runiverse-package R-CMD-check Codecov test coverage

The fwildclusterboot package is an R port of STATA’s boottest package.

It implements the fast wild cluster bootstrap algorithm developed in Roodman et al (2019) for regression objects in R. It currently works for regression objects of type lm, felm and fixest from base R and the lfe and fixest packages.

The package’s central function is boottest(). It allows the user to test univariate hypotheses using a wild cluster bootstrap. The “fast” algorithm developed in Roodman et al makes it feasible to calculate test statistics based on a large number of bootstrap draws even for large samples – as long as the number of bootstrapping clusters is not too large.

The fwildclusterboot package currently supports multi-dimensional clustering and one-dimensional hypotheses. It supports regression weights, multiple distributions of bootstrap weights, fixed effects, restricted (WCR) and unrestricted (WCU) bootstrap inference and subcluster bootstrapping for few treated clusters (MacKinnon & Webb, (2018)).

If you are interested in the wild cluster bootstrap for IV models (Davidson & MacKinnon, 2010) or want to test multiple joint hypotheses, you can use the wildboottestjlr package, which is an R wrapper of the WildBootTests.jl Julia package. While fwildclusterboot is already quite fast (see the benchmarks below), the implementation of the wild bootstrap for OLS in WildBootTests.jl is - after compilation - orders of magnitudes faster, in particular for problems with a large number of clusters.

The boottest() function

# note: for performance reasons, the sampling of the bootstrap weights of types Rademacher, Webb and Normal within
# fwildclusterboot are handled via the dqrng package, which is installed with the
# package as a dependency. To set a global seed for boottest() for these weight types, use dqrng's dqset.seed() function
# For Mammen weights, one can set a global seed via the set.seed() function.

# set global seed for Rademacher, Webb and Normal weights
library(dqrng)
dqrng::dqset.seed(965326)
# set a global seed for Mammen weights
set.seed(23325)

library(fwildclusterboot)

data(voters)

# fit the model via fixest::feols(), lfe::felm() or stats::lm()

lm_fit <- lm(proposition_vote ~ treatment  + log_income + as.factor(Q1_immigration) + as.factor(Q2_defense), data = voters)
# bootstrap inference via boottest()
lm_boot <- boottest(lm_fit, clustid = c("group_id1"), B = 9999, param = "treatment", seed = 1)
summary(lm_boot)
#> boottest.lm(object = lm_fit, clustid = c("group_id1"), param = "treatment", 
#>     B = 9999, seed = 1)
#>  
#>  Hypothesis: 1*treatment = 0
#>  Observations: 300
#>  Bootstr. Iter: 9999
#>  Bootstr. Type: rademacher
#>  Clustering: 1-way
#>  Confidence Sets: 95%
#>  Number of Clusters: 40
#> 
#>              term estimate statistic p.value conf.low conf.high
#> 1 1*treatment = 0    0.079     3.983       0     0.04     0.118

library(fixest)
feols_fit <- feols(proposition_vote ~ treatment  + log_income | Q1_immigration + Q2_defense, data = voters)
# bootstrap inference via boottest()
feols_boot <- boottest(feols_fit, clustid = c("group_id1"), B = 9999, param = "treatment", seed = 1)
summary(feols_boot)
#> boottest.fixest(object = feols_fit, clustid = c("group_id1"), 
#>     param = "treatment", B = 9999, seed = 1)
#>  
#>  Hypothesis: 1*treatment = 0
#>  Observations: 300
#>   Bootstr. Type: rademacher
#>  Clustering: 1-way
#>  Confidence Sets: 95%
#>  Number of Clusters: 40
#> 
#>              term estimate statistic p.value conf.low conf.high
#> 1 1*treatment = 0    0.079     4.117       0     0.04     0.118

For a longer introduction to the package’s key function, boottest(), please follow this link.

Benchmarks

Results of timing benchmarks of boottest(), with a sample of N = 10000, k = 20 covariates and one cluster of dimension N_G (3 iterations each, median runtime is plotted).

Installation

You can install compiled versions offwildclusterboot from CRAN and the development version from R-universe (compiled) or github by following one of the steps below:

# from CRAN 
install.packages("fwildclusterboot")

# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')

# dev version from github
# note: installation requires Rtools
library(devtools)
install_github("s3alfisc/fwildclusterboot")