Sondre U. Solstad
The QuickReg package and associated function provides an easy interface for linear regression in R. This includes the option to request robust and clustered standard errors (equivalent to STATA's ", robust" option), automatic labeling, an easy way to specify multiple regression specifications simultaneously, and a compact html or latex output (relying on the widely used "stargazer" package).
QuickReg also includes a new method to speed up OLS computation. In particular, it offers the option to implement a fixed effect demeaning procedure which demeans a set of covariates and then shares this across multiple regression specifications. In tests (reported below), this reduces calculation time by more than 60 percent for analysis with a large number of fixed effects compared to base R. This relative performance gain is increasing in the number of specifications passed to the function simultaneously.
Written by Sondre U. Solstad, Princeton University (ssolstad@princeton.edu). Send me an email if you find this package useful or want to suggest an improvement or feature.
Installation instructions:
library(devtools)
install_github("sondreus/QuickReg")
library(QuickReg)
# Loading data
mydata <- readRDS("3d_example.RDS")
# Use the QuickReg to produce a regression table
QuickReg(data = mydata,
iv.vars = c("upop", "log_gdppc_mad", "SDI", "SDT", "war", "polity2"),
iv.vars.names = c("Urban Population", "Log(GDPPC)", "Spatial distance to income",
"Spatial distance to technology", "At War", "Polity2 score"),
dv.vars = c("log_adoption_lvl_pc", "distance_to_frontier"),
dv.vars.names = c("Technology Adoption Level", "Distance to Frontier"),
specifications = list( c(1, 4, 5, 6),
c(1, 2, 3, 4),
c(1, 3, 4)),
fixed.effects = c("ccode", "technology", "year"),
fixed.effects.names = c("Country FE", "Year FE", "Tech. FE"),
robust.se = TRUE,
type = "html",
silent = TRUE
)
Dependent variable: | ||||||
Technology Adoption Level | Distance to Frontier | |||||
(1) | (2) | (3) | (4) | (5) | (6) | |
Urban Population | 0.00000\*\*\* | -0.00000 | 0.00000\*\*\* | 0.00000\*\*\* | 0.00000 | 0.00000\*\*\* |
(0.00000) | (0.00000) | (0.00000) | (0.00000) | (0.00000) | (0.00000) | |
Log(GDPPC) | 1.157\*\*\* | 0.161\*\*\* | ||||
(0.071) | (0.011) | |||||
Spatial distance to income | 0.002\*\*\* | 0.001\*\* | -0.0001 | -0.0002\*\* | ||
(0.0004) | (0.0005) | (0.0001) | (0.0001) | |||
Spatial distance to technology | 0.005\*\*\* | 0.004\*\*\* | 0.004\*\*\* | 0.0004\*\*\* | 0.001\*\*\* | 0.001\*\*\* |
(0.0003) | (0.0003) | (0.0004) | (0.0001) | (0.0001) | (0.0001) | |
At War | -0.092 | -0.015 | ||||
(0.069) | (0.012) | |||||
Polity2 score | -0.027\*\*\* | -0.002\*\*\* | ||||
(0.004) | (0.001) | |||||
Constant | 2.730\*\*\* | -9.238\*\*\* | 2.290\*\*\* | 0.510\*\*\* | -1.077\*\*\* | 0.526\*\*\* |
(0.187) | (0.739) | (0.204) | (0.040) | (0.116) | (0.043) | |
Country FE | Yes | Yes | Yes | Yes | Yes | Yes |
Year FE | Yes | Yes | Yes | Yes | Yes | Yes |
Tech. FE | Yes | Yes | Yes | Yes | Yes | Yes |
Observations | 5,994 | 5,884 | 6,240 | 5,943 | 5,833 | 6,189 |
R2 | 0.887 | 0.893 | 0.885 | 0.856 | 0.860 | 0.852 |
Adjusted R2 | 0.884 | 0.890 | 0.881 | 0.851 | 0.856 | 0.848 |
Residual Std. Error | 0.803 (df = 5810) | 0.780 (df = 5706) | 0.814 (df = 6052) | 0.137 (df = 5759) | 0.134 (df = 5655) | 0.139 (df = 6001) |
F Statistic | 249.986\*\*\* (df = 183; 5810) | 270.052\*\*\* (df = 177; 5706) | 248.741\*\*\* (df = 187; 6052) | 186.565\*\*\* (df = 183; 5759) | 196.433\*\*\* (df = 177; 5655) | 184.946\*\*\* (df = 187; 6001) |
Note: | *p<0.1; **p<0.05; ***p<0.01 | |||||
(Robust Standard Errors in Parenthesis) |
- data - Data frame in which all model variables are located.
- iv.vars - Vector of independent variable names in dataset (e.g. c("gdppc", "pop"))
- iv.vars.names - (Optional) Vector of desired independent variable names in table output (e.g. c("GDP per capita", "Population")). Defaults to values in "iv.vars" if none provided.
- dv.vars - Vector of dependent variable in dataset (e.g. c("democracy", "war"))
- dv.vars.names - (Optional) Vector of desired dependent variable names in table output (e.g. c("Democracy (Boix-Rosato-Miller 2012)", "War (with at least 1000 battle deaths)")). Defaults to values in "dv.vars" if none provided.
- specifications - (Optional) List of desired regression specifications (selections of independent variables). The list of regression specifications are applied to all dependent variables. E.g. list(c(1), c(1,2), c(2))).
- fixed.effects - (Optional) Vector of desired fixed effect variable names in dataset (e.g. c("region", "year"))
- fixed.effects.names - (Optional) Vector of desired fixed effects labels in table output (e.g. c("Region FE", "Year FE")). Defaults to values in "fixed.effects" if none provided.
- fixed.effects.specifications - (Optional) List of desired fixed effect specifications (selections of independent variables). These specifications are applied in sequence from the first to last model. If the number of specifications is less than the number of models, all fixed effects are applied in the remaining columns by default. If none provided, defaults to all fixed effects in all models.
- robust.se - (Optional) If TRUE, returns robust standard errors calculated using a sandwich estimator from the "sandwich" package. Defaults to FALSE (i.e. normal standard errors).
- cluster - (Optional) Name of variable in dataset by which cluster-robust standard errors should be computed using the cluster.vcov command of the multiwayvcov package.
- cluster.names - (Optional) Desired name or label of clustering variable to be reported in table output (e.g. "Country" yields a note on the bottom of the table reading "Country-Clustered Standard Errors in Parenthesis"). If cluster specified but no "cluster.names" provided, "Cluster-Robust Standard Errors in Parenthesis" is reported.
- table.title - (Optional) Specifies the title of the table with regression output. Defaults to "QuickReg" plus the date and time of creation in parenthesis.
- out.name - (Optional) Specifies the output file name. Defaults to "QuickReg.html".
- dynamic.out.name - (Optional) If TRUE, adds date and time of creation in brackets between the out.name and the file extension (e.g. QuickReg (2017-04-05-14-01-27).html)
- html.only - (Optional) If TRUE, no latex output produced (only HTML table). Defaults to FALSE.
- type - (Optional) Specifies the type of table output that will be requested from Stargazer. Possible values are: "latex", "html", and "text". Defaults to "latex".
- silent - (Optional) If TRUE, no messages are returned by the function. Defaults to FALSE.
- save.fits - (Optional) If TRUE, saves fitted lm objects in a list by the name "QuickReg.fits" adding an integer if an object by this name already exists. Defaults to FALSE.
- demeaning.acceleration - (Optional) If TRUE, attempts to speed up regression by the method of alternating projections. In particular, it utilizes the "demeanlist" function of the "lfe" package to create a matrix of all covariates demeaned by all fixed effects, and then fits the different regression specifications on this demeaned matrix. Time saved is increasing in the number of fixed effects, specifications and observations, and this method is slower when all these are low. If there are thousands of fixed effects and many specifications, time saved is potentially quite large. Note: Overrides fixed.effects.specifications, always including all variables specified in fixed.effects, and does not supply R-squared or other model statistics. Defaults to FALSE.
- ... - Various options passed to the stargazer function. In particular: stargazer.digits = integer of number of digits to be displayed, stargazer.font.size = font size (e.g. "tiny") if output is latex (no font size is imposed by default), stargazer.style = table style (see "?stargazer_style_list"), stargazer.omit.stat = character vector of model statistics to be omitted from table output.
The QuickReg function is meant to provide a comprehensive and convenient linear regression interface in R. It has been designed with the objective of being intuitive and easy to use at default settings, but with enough options for advanced users. Most importantly, the function is meant to facilitate a smooth, quick and productive workflow.
QuickReg is designed to work seamlessly with knitr and Rmarkdown, and allows output to be requested from stargazer in "latex", "html", or "text" format.
To illustrate the use of QuickReg, consider a researcher considering the linear relationships between a few variables.
N <- 1000
mydata <- cbind.data.frame(rnorm(N), rnorm(N), rnorm(N), rnorm(N), rnorm(N),
rep(seq(1:10), N/10), sample(1:10, N, replace = TRUE))
colnames(mydata) <- c("y", "alternative.y", "x1", "x2", "x3", "group1", "group2")
Let's fit a simple regression in base R:
# Standard R
summary(lm(y ~ x1, data = mydata))
##
## Call:
## lm(formula = y ~ x1, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9844 -0.6539 0.0227 0.6300 3.2018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02495 0.03155 0.791 0.429
## x1 -0.02814 0.03210 -0.877 0.381
##
## Residual standard error: 0.9978 on 998 degrees of freedom
## Multiple R-squared: 0.0007692, Adjusted R-squared: -0.000232
## F-statistic: 0.7683 on 1 and 998 DF, p-value: 0.381
And then in QuickReg:
# QuickReg
QuickReg(data = mydata, dv.vars = "y", iv.vars = "x1", type = "html")
Dependent variable: | |
y | |
x1 | -0.028 |
(0.032) | |
Constant | 0.025 |
(0.032) | |
Observations | 1,000 |
R2 | 0.001 |
Adjusted R2 | -0.0002 |
Residual Std. Error | 0.998 (df = 998) |
F Statistic | 0.768 (df = 1; 998) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
(Normal Standard Errors in Parenthesis) |
Base R:
# Standard R
summary(lm(y ~ x1 + x2 + x3, data = mydata))
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0063 -0.6598 0.0200 0.6317 3.1845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.025526 0.031687 0.806 0.421
## x1 -0.027576 0.032155 -0.858 0.391
## x2 -0.028927 0.031189 -0.927 0.354
## x3 0.008153 0.030693 0.266 0.791
##
## Residual standard error: 0.9983 on 996 degrees of freedom
## Multiple R-squared: 0.001724, Adjusted R-squared: -0.001283
## F-statistic: 0.5733 on 3 and 996 DF, p-value: 0.6327
QuickReg:
# QuickReg
QuickReg(data = mydata, dv.vars = "y", iv.vars = c("x1", "x2", "x3"), type = "html")
Dependent variable: | |
y | |
x1 | -0.028 |
(0.032) | |
x2 | -0.029 |
(0.031) | |
x3 | 0.008 |
(0.031) | |
Constant | 0.026 |
(0.032) | |
Observations | 1,000 |
R2 | 0.002 |
Adjusted R2 | -0.001 |
Residual Std. Error | 0.998 (df = 996) |
F Statistic | 0.573 (df = 3; 996) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
(Normal Standard Errors in Parenthesis) |
Base R:
# Standard R
summary(lm(y ~ x1, data = mydata))
##
## Call:
## lm(formula = y ~ x1, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9844 -0.6539 0.0227 0.6300 3.2018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02495 0.03155 0.791 0.429
## x1 -0.02814 0.03210 -0.877 0.381
##
## Residual standard error: 0.9978 on 998 degrees of freedom
## Multiple R-squared: 0.0007692, Adjusted R-squared: -0.000232
## F-statistic: 0.7683 on 1 and 998 DF, p-value: 0.381
summary(lm(y ~ x2, data = mydata))
##
## Call:
## lm(formula = y ~ x2, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0404 -0.6593 0.0222 0.6398 3.2022
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02590 0.03158 0.820 0.412
## x2 -0.03004 0.03113 -0.965 0.335
##
## Residual standard error: 0.9977 on 998 degrees of freedom
## Multiple R-squared: 0.0009324, Adjusted R-squared: -6.867e-05
## F-statistic: 0.9314 on 1 and 998 DF, p-value: 0.3347
summary(lm(y ~ x3, data = mydata))
##
## Call:
## lm(formula = y ~ x3, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0493 -0.6595 0.0090 0.6383 3.2268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.024017 0.031651 0.759 0.448
## x3 0.008293 0.030640 0.271 0.787
##
## Residual standard error: 0.9981 on 998 degrees of freedom
## Multiple R-squared: 7.34e-05, Adjusted R-squared: -0.0009285
## F-statistic: 0.07326 on 1 and 998 DF, p-value: 0.7867
QuickReg:
# QuickReg
QuickReg(data = mydata, dv.vars = "y", iv.vars = c("x1", "x2", "x3"), specifications = list(1, 2, 3),
type = "html")
Dependent variable: | |||
y | |||
(1) | (2) | (3) | |
x1 | -0.028 | ||
(0.032) | |||
x2 | -0.030 | ||
(0.031) | |||
x3 | 0.008 | ||
(0.031) | |||
Constant | 0.025 | 0.026 | 0.024 |
(0.032) | (0.032) | (0.032) | |
Observations | 1,000 | 1,000 | 1,000 |
R2 | 0.001 | 0.001 | 0.0001 |
Adjusted R2 | -0.0002 | -0.0001 | -0.001 |
Residual Std. Error (df = 998) | 0.998 | 0.998 | 0.998 |
F Statistic (df = 1; 998) | 0.768 | 0.931 | 0.073 |
Note: | *p<0.1; **p<0.05; ***p<0.01 | ||
(Normal Standard Errors in Parenthesis) |
We might also want robust standard errors or standard errors clustered at "group1":
Base R: See this guide by Drew Dimmery (it involves specifying a custom function, and then passing fitted models throught the function one at a time).
QuickReg: simply select "robust.se = TRUE" or "cluster ='clustering variable'":
# QuickReg
QuickReg(data = mydata, dv.vars = "y", iv.vars = c("x1", "x2", "x3"), specifications = list(1, 2, 3),
type = "html", robust.se = TRUE)
Dependent variable: | |||
y | |||
(1) | (2) | (3) | |
x1 | -0.028 | ||
(0.033) | |||
x2 | -0.030 | ||
(0.032) | |||
x3 | 0.008 | ||
(0.031) | |||
Constant | 0.025 | 0.026 | 0.024 |
(0.032) | (0.032) | (0.032) | |
Observations | 1,000 | 1,000 | 1,000 |
R2 | 0.001 | 0.001 | 0.0001 |
Adjusted R2 | -0.0002 | -0.0001 | -0.001 |
Residual Std. Error (df = 998) | 0.998 | 0.998 | 0.998 |
F Statistic (df = 1; 998) | 0.768 | 0.931 | 0.073 |
Note: | *p<0.1; **p<0.05; ***p<0.01 | ||
(Robust Standard Errors in Parenthesis) |
Dependent variable: | |||
y | |||
(1) | (2) | (3) | |
x1 | -0.028 | ||
(0.026) | |||
x2 | -0.030 | ||
(0.028) | |||
x3 | 0.008 | ||
(0.026) | |||
Constant | 0.025 | 0.026 | 0.024 |
(0.039) | (0.039) | (0.040) | |
Observations | 1,000 | 1,000 | 1,000 |
R2 | 0.001 | 0.001 | 0.0001 |
Adjusted R2 | -0.0002 | -0.0001 | -0.001 |
Residual Std. Error (df = 998) | 0.998 | 0.998 | 0.998 |
F Statistic (df = 1; 998) | 0.768 | 0.931 | 0.073 |
Note: | *p<0.1; **p<0.05; ***p<0.01 | ||
(Cluster-Robust Standard Errors in Parenthesis) |
# QuickReg
QuickReg(data = mydata, dv.vars = "y", iv.vars = c("x1", "x2", "x3"),
specifications = list(1, 2, 3, c(1, 3), c(1,2), c(2, 3), c(1,2,3)),
type = "html", robust.se = TRUE)
Dependent variable: | |||||||
y | |||||||
(1) | (2) | (3) | (4) | (5) | (6) | (7) | |
x1 | -0.028 | -0.029 | -0.027 | -0.028 | |||
(0.033) | (0.033) | (0.033) | (0.033) | ||||
x2 | -0.030 | -0.029 | -0.030 | -0.029 | |||
(0.032) | (0.033) | (0.033) | (0.033) | ||||
x3 | 0.008 | 0.009 | 0.007 | 0.008 | |||
(0.031) | (0.031) | (0.031) | (0.031) | ||||
Constant | 0.025 | 0.026 | 0.024 | 0.024 | 0.026 | 0.025 | 0.026 |
(0.032) | (0.032) | (0.032) | (0.032) | (0.032) | (0.032) | (0.032) | |
Observations | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
R2 | 0.001 | 0.001 | 0.0001 | 0.001 | 0.002 | 0.001 | 0.002 |
Adjusted R2 | -0.0002 | -0.0001 | -0.001 | -0.001 | -0.0003 | -0.001 | -0.001 |
Residual Std. Error | 0.998 (df = 998) | 0.998 (df = 998) | 0.998 (df = 998) | 0.998 (df = 997) | 0.998 (df = 997) | 0.998 (df = 997) | 0.998 (df = 996) |
F Statistic | 0.768 (df = 1; 998) | 0.931 (df = 1; 998) | 0.073 (df = 1; 998) | 0.430 (df = 2; 997) | 0.825 (df = 2; 997) | 0.492 (df = 2; 997) | 0.573 (df = 3; 996) |
Note: | *p<0.1; **p<0.05; ***p<0.01 | ||||||
(Robust Standard Errors in Parenthesis) |
# QuickReg
QuickReg(data = mydata, dv.vars = c("y", "alternative.y"), iv.vars = c("x1", "x2", "x3"), specifications = list(1, 2, 3),
type = "html", robust.se = TRUE)
Dependent variable: | ||||||
y | alternative.y | |||||
(1) | (2) | (3) | (4) | (5) | (6) | |
x1 | -0.028 | -0.019 | ||||
(0.033) | (0.032) | |||||
x2 | -0.030 | 0.015 | ||||
(0.032) | (0.030) | |||||
x3 | 0.008 | 0.089\*\*\* | ||||
(0.031) | (0.030) | |||||
Constant | 0.025 | 0.026 | 0.024 | -0.029 | -0.030 | -0.036 |
(0.032) | (0.032) | (0.032) | (0.031) | (0.031) | (0.031) | |
Observations | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
R2 | 0.001 | 0.001 | 0.0001 | 0.0004 | 0.0003 | 0.009 |
Adjusted R2 | -0.0002 | -0.0001 | -0.001 | -0.001 | -0.001 | 0.008 |
Residual Std. Error (df = 998) | 0.998 | 0.998 | 0.998 | 0.971 | 0.971 | 0.966 |
F Statistic (df = 1; 998) | 0.768 | 0.931 | 0.073 | 0.372 | 0.257 | 8.965\*\*\* |
Note: | *p<0.1; **p<0.05; ***p<0.01 | |||||
(Robust Standard Errors in Parenthesis) |
# QuickReg
QuickReg(data = mydata, dv.vars = c("y", "alternative.y"), iv.vars = c("x1", "x2", "x3"), fixed.effects = c("group1", "group2"), specifications = list(1, 2, 3),
type = "html", robust.se = TRUE)
Dependent variable: | ||||||
y | alternative.y | |||||
(1) | (2) | (3) | (4) | (5) | (6) | |
x1 | -0.027 | -0.018 | ||||
(0.033) | (0.031) | |||||
x2 | -0.033 | 0.023 | ||||
(0.032) | (0.029) | |||||
x3 | 0.021 | 0.080\*\*\* | ||||
(0.030) | (0.030) | |||||
Constant | -0.008 | -0.006 | -0.004 | 0.101 | 0.108 | 0.095 |
(0.148) | (0.148) | (0.148) | (0.127) | (0.127) | (0.127) | |
group1 | Yes | Yes | Yes | Yes | Yes | Yes |
group2 | Yes | Yes | Yes | Yes | Yes | Yes |
Observations | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
R2 | 0.025 | 0.026 | 0.025 | 0.025 | 0.025 | 0.032 |
Adjusted R2 | 0.007 | 0.007 | 0.006 | 0.006 | 0.006 | 0.013 |
Residual Std. Error (df = 980) | 0.994 | 0.994 | 0.995 | 0.967 | 0.967 | 0.964 |
F Statistic (df = 19; 980) | 1.345 | 1.366 | 1.332 | 1.329 | 1.340 | 1.701\*\* |
Note: | *p<0.1; **p<0.05; ***p<0.01 | |||||
(Robust Standard Errors in Parenthesis) |
# QuickReg
QuickReg(data = mydata,
table.title = "My Regression Results",
dv.vars = c("y", "alternative.y"),
dv.vars.names = c("Outcome", "Alternative Outcome"),
iv.vars = c("x1", "x2", "x3"),
iv.vars.names = c("Variable 1", "Variable 2", "Variable 3"),
fixed.effects = c("group1", "group2"),
fixed.effects.names = c("Group 1 FE", "Group 2 FE"),
specifications = list(1, 2, 3),
cluster = "group1",
cluster.names = "Group 1",
type = "html")
Dependent variable: | ||||||
Outcome | Alternative Outcome | |||||
(1) | (2) | (3) | (4) | (5) | (6) | |
Variable 1 | -0.027 | -0.018 | ||||
(0.023) | (0.047) | |||||
Variable 2 | -0.033 | 0.023 | ||||
(0.031) | (0.021) | |||||
Variable 3 | 0.021 | 0.080\*\*\* | ||||
(0.028) | (0.026) | |||||
Constant | -0.008 | -0.006 | -0.004 | 0.101 | 0.108 | 0.095 |
(0.120) | (0.125) | (0.125) | (0.082) | (0.079) | (0.076) | |
Group 1 FE | Yes | Yes | Yes | Yes | Yes | Yes |
Group 2 FE | Yes | Yes | Yes | Yes | Yes | Yes |
Observations | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
R2 | 0.025 | 0.026 | 0.025 | 0.025 | 0.025 | 0.032 |
Adjusted R2 | 0.007 | 0.007 | 0.006 | 0.006 | 0.006 | 0.013 |
Residual Std. Error (df = 980) | 0.994 | 0.994 | 0.995 | 0.967 | 0.967 | 0.964 |
F Statistic (df = 19; 980) | 1.345 | 1.366 | 1.332 | 1.329 | 1.340 | 1.701\*\* |
Note: | *p<0.1; **p<0.05; ***p<0.01 | |||||
(Group 1-Clustered Standard Errors in Parenthesis) |
With a large number of fixed effects, regression analysis can take a very long time. QuickReg offers a solution to this problem. First, QuickReg implements the method of alternating projections, which takes advantage of the fact that fixed effects are equivalent to "demeaning" covariates by the levels of the fixed effects. If the number of fixed effects are large, it can be faster to demean than to invert matricies. This procedure is implemented through the demean.list function in the lfe package.
Secondly, and more importantly for speed purposes, one often wants to calculate results for a number of different specifications of IVs and DVs with the same fixed effects and sample of observations. QuickReg is suitable for such cases for several reasons, the first being that it provides a convenient interface for listing specifications, and second because it summarizes model results in the familiar and concise table format with columns corresponding to different models. QuickReg is also able to speed up the calculations of such tables significantly by applying the demeaning procedure to a single covariate matrix shared by all specifications. While standard implementations calculate fixed effects repeatedly for different specifications, it is here only done once, and then shared across all specifications. Two words of caution are in order: (1) this limits calculations to a common sample, and (2) it makes fitted objects' model statistics (e.g. R-squared) meaningless (these are removed from the resultant table automatically). Coefficient confidence intervals can and are however still calculated correctly, and all QuickReg options (including to calculate robust SEs) are available.
library("microbenchmark")
N <- 1000
mydata <- cbind.data.frame(rnorm(N), rnorm(N), rnorm(N), rnorm(N), rnorm(N),
rep(seq(1:100), N/100), sample(1:100, N, replace = TRUE))
colnames(mydata) <- c("y", "alternative.y", "x1", "x2", "x3", "group1", "group2")
# Testing performance gain:
speed.test <- suppressWarnings(microbenchmark(QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"),
dv.vars = c("y", "alternative.y"),
specifications = list( c(1),
c(2, 3),
c(1, 2, 3)),
fixed.effects = c("group1", "group2"),
html.only = TRUE,
silent = TRUE,
out.name = "QuickReg.normal",
# Demeaning acceleration is set to FALSE (default)
demeaning.acceleration = FALSE
),
QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"),
dv.vars = c("y", "alternative.y"),
specifications = list( c(1),
c(2, 3),
c(1, 2, 3)),
fixed.effects = c("group1", "group2"),
html.only = TRUE,
silent = TRUE,
out.name = "QuickReg.fast",
# Demeaning acceleration is set to TRUE (not default)
demeaning.acceleration = TRUE
),
# Specifying number of trails.
times = 20))
speed.test
## Unit: milliseconds
## expr
## QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"), dv.vars = c("y", "alternative.y"), specifications = list(c(1), c(2, 3), c(1, 2, 3)), fixed.effects = c("group1", "group2"), html.only = TRUE, silent = TRUE, out.name = "QuickReg.normal", demeaning.acceleration = FALSE)
## QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"), dv.vars = c("y", "alternative.y"), specifications = list(c(1), c(2, 3), c(1, 2, 3)), fixed.effects = c("group1", "group2"), html.only = TRUE, silent = TRUE, out.name = "QuickReg.fast", demeaning.acceleration = TRUE)
## min lq mean median uq max neval cld
## 1640.0228 2074.391 2359.198 2286.073 2386.463 4618.851 20 b
## 920.2494 1101.517 1250.377 1196.808 1377.422 1688.904 20 a
print( paste("Total number of observations:", N))
## [1] "Total number of observations: 1000"
print( paste( "Total number of fixed effects:", length(unique(mydata[, "group1"])) + length(unique(mydata[, "group2"]))))
## [1] "Total number of fixed effects: 200"
In the above example, QuickReg's acceleration reduced the time spent by more than 60 percent relative to the standard R (the "lm()"-function which QuickReg relies on by default). Gains can be even larger when we increase the number of fixed effects, as in the below example:
library("microbenchmark")
N <- 10000
mydata <- cbind.data.frame(rnorm(N), rnorm(N), rnorm(N), rnorm(N), rnorm(N),
rep(seq(1:1000), N/1000), sample(1:100, N, replace = TRUE))
colnames(mydata) <- c("y", "alternative.y", "x1", "x2", "x3", "group1", "group2")
# Testing performance gain:
speed.test <- suppressWarnings(microbenchmark(QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"),
dv.vars = c("y", "alternative.y"),
specifications = list( c(1),
c(2, 3),
c(1, 2, 3)),
fixed.effects = c("group1", "group2"),
html.only = TRUE,
silent = TRUE,
out.name = "QuickReg.normal.html",
# Demeaning acceleration is set to FALSE (default)
demeaning.acceleration = FALSE
),
QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"),
dv.vars = c("y", "alternative.y"),
specifications = list( c(1),
c(2, 3),
c(1, 2, 3)),
fixed.effects = c("group1", "group2"),
html.only = TRUE,
silent = TRUE,
out.name = "QuickReg.fast.html",
# Demeaning acceleration is set to TRUE (not default)
demeaning.acceleration = TRUE
),
# Specifying number of trails.
times = 10))
speed.test
## Unit: seconds
## expr
## QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"), dv.vars = c("y", "alternative.y"), specifications = list(c(1), c(2, 3), c(1, 2, 3)), fixed.effects = c("group1", "group2"), html.only = TRUE, silent = TRUE, out.name = "QuickReg.normal.html", demeaning.acceleration = FALSE)
## QuickReg(data = mydata, iv.vars = c("x1", "x2", "x3"), dv.vars = c("y", "alternative.y"), specifications = list(c(1), c(2, 3), c(1, 2, 3)), fixed.effects = c("group1", "group2"), html.only = TRUE, silent = TRUE, out.name = "QuickReg.fast.html", demeaning.acceleration = TRUE)
## min lq mean median uq max neval cld
## 77.87794 84.15044 89.29405 87.46742 97.41462 100.89696 10 b
## 26.64743 30.98438 31.80409 32.15318 32.95510 36.11876 10 a
print( paste("Total number of observations:", N))
## [1] "Total number of observations: 10000"
print( paste( "Total number of fixed effects:", length(unique(mydata[, "group1"])) + length(unique(mydata[, "group2"]))))
## [1] "Total number of fixed effects: 1100"
This package relies on the stargazer package by Marek Hlavac, the sandwich package by Thomas Lumley and Achim Zeileis, the lfe package by Simen Gaure, the multivcov package by Nathaniel Graham and Mahmood Arai and Björn Hagströmer, and the lmtest package by Torsten Hothorn, Achim Zeileis, Richard W. Farebrother, Clint Cummins, Giovanni Millo, and David Mitchell.
See also:
Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2. http://CRAN.R-project.org/package=stargazer
Solstad, Sondre Ulvund (2018). QuickReg: A Fast and Easy OLS Interface in R. https://github.com/sondreus/QuickReg#quickreg