The TestMechs package implements the methodology from the paper
“Testing
Mechanisms”
by Soonwoo Kwon and Jonathan Roth. The package provides tests for the
“sharp null of full mediation”, which conjectures that the effect of a
treatment operates through a particular conjectured mechanism (or set of
mechanisms) M. It also provides lower bounds on the fraction of
“always-takers” who are affected by the treatment despite having the
same value of M regardless of treatment status. For now, the package
assumes that the treatment is as good as randomly assigned (as in an
RCT); we hope to add support for conditional random assignment in future
iterations of the package. Note that the approach in the paper requires
the mediator
You can install the development version of TestMechs from GitHub with:
# Install devtools if not already installed
install.packages("devtools")
# Install package
devtools::install_github("jonathandroth/TestMechs")
We illustrate how the package can be used by walking through how the
code can be applied to the application of Baranov et al. (2020) in
Section 5.2 of “Testing
Mechanisms”.
In Baranov et al. (2020),
We start with loading the required packages and the data.
# Load TestMechs
library(TestMechs)
# Load other packages that are required to run the example
library(dplyr)
library(ggplot2)
library(haven)
# Load data
data("baranov_data")
# Restrict to the experimental sample
mother_data <- mother_data %>% filter(THP_sample == 1)
We begin with the case where
The main functions we will be using are: 1) partial_density_plot()
to
plot the partial densities to visually detect potential violations of
the sharp null, 2) test_sharp_null()
to conduct a statistical test
for the sharp null and 3) lb_frac_affected()
to compute the sharp
lower bound for the fraction of always-takers (or never-takers) that are
affected by treatment. While this example covers the basic usage of
these functions, please refer to the documentation of each function
(e.g., ?test_sharp_null
) for a more detailed description.
We first provide graphical evidence using a partial density plot using
the function partial_density_plot()
. When
nt_plot <-
partial_density_plot(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
num_Ybins = 5,
plot_nts = T,
density_1_label = "Prob. In Treated Group (P(Y,M=0|D=1))",
density_0_label = "Prob. In Control Group (P(Y,M=0|D=0))") +
ylab("Probability") +
xlab("Event: No Grandmother Present (M=0) and Y in Stated Range")
nt_plot +
annotate(geom = "text",
x = 5,y=0.15,
label = "Higher prob. in \n treated group, \n violating sharp null", size = 3) +
geom_segment(x=4.7,y=0.17,
xend = 4.5, yend = 0.175,
arrow = arrow(length = unit(0.03,"npc")),
color = "black", show.legend = F) +
geom_segment(x=5,y=0.13,
xend = 5, yend = 0.10,
arrow = arrow(length = unit(0.03,"npc")),
color = "black", show.legend = F) +
theme(legend.position="bottom",
legend.title = element_blank())
This figure shows estimates of
The argument plot_nts = T
tells the package to make a plot showing the
inequalities corresponding to there being no treatment effect for the
“never-takers” (i.e. individuals with plot_nts
is set to F
, we get a similar plot for the
always-takers, which checks whether
at_plot <-
partial_density_plot(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
num_Ybins = 5,
plot_nts = F,
density_1_label = "Prob. In Treated Group (P(Y,M=1|D=1))",
density_0_label = "Prob. In Control Group (P(Y,M=1|D=0))") +
ylab("Probability") +
xlab("Event: Grandmother Present (M=1) and Y in Stated Range")
at_plot +
theme(legend.position="bottom",
legend.title = element_blank())
While the figure above hints at a possible violation of the sharp
null, it does not come with any uncertainty quantification. The
function test_sharp_null()
conducts statistical inference of the
sharp null using the method described in Section 4 of the paper. The
following snippet runs this test based on the test proposed in Cox and
Shi (2023), which is our recommended approach for most applications. The
package supports using the tests provided by Andrews, Roth, and Pakes
(2023) and Fang, Santos, Shaikh, and Torgovitsky (2023); these methods
can be specified by changing the method
argument from "CS"
to
"ARP"
or "FSST"
. When M is binary, as in our example here, one can
also use the test from Kitagawa (2015) by setting method = "toru"
.
test_result <- test_sharp_null(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
method = "CS", #use Cox and Shi test
num_Ybins = 5, #discretize using 5 bins
cluster = "uc") #cluster SEs at uc level
#> Loading required package: lpinfer
test_result$pval
#> [,1]
#> [1,] 0.02283916
The test gives a p-value of 0.023, and thus the sharp null is rejected
at the 5% significance level. Here, the p-value corresponds to the
smallest value of num_Ybins = 5
. Currently, the function discretizes num_Ybins
value is not provided but
The test above suggests that the treatment effect does not operate
entirely through the presence of a grandmother in the home. There are
some people (never-takers) whose outcome is affected by the treatment
despite having no change in lb_frac_affected
computes (a point estimate of) this
lower bound. The argument at_group = 0
corresponds to computing this
lower bound for the never-takers, who are referred to as “0-always
takers” in the more general notation in the paper.
lb_nts <- lb_frac_affected(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
num_Ybins = 5,
at_group = 0)
lb_nts
#> [1] 0.1858912
Our estimates of the lower bound imply that at least 19 percent of
never-takers are affected by the treatment. One could likewise test the
fraction of never-takers affected by setting at_group = 1
(in this
case, the lower bound is zero). If at_group
is set to NULL
, then the
package calculates the fraction pooling across all types that have the
same value of
By default, TestMechs imposes the monotonicity assumption that the
treatment can only increase the value of max_defiers_share
parameter to be non-zero, which bounds the number of
“defiers” by max_defiers_share
.
We rerun the test above with max_defiers_share = .01
, which allows one
percent of the population to be defiers.
test_result_defiers <- test_sharp_null(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
method = "CS",
num_Ybins = 5,
cluster = "uc",
max_defiers_share = .01)
test_result_defiers$pval
#> [,1]
#> [1,] 0.04630939
The p-value increases to 0.046, so the test rejects the sharp null even if you allow one percent of the population to be defiers. (Allowing for larger shares of defiers will eventually lead to an insignificant result.)
Likewise, we can also calculate the lower bound on the fraction of never-takers under this relaxed monotonicity.
lb_nts_defiers <- lb_frac_affected(df = mother_data,
d = "treat",
m = "grandmother",
y = "motherfinancial",
num_Ybins = 5,
at_group = 0,
max_defiers_share = .01)
lb_nts_defiers
#> [1] 0.1716415
Our estimates of the lower bound imply that at least 17 percent of never-takers are affected by the treatment when we allow one percent of the population to be defiers.
We next turn to the setting where we are interested in testing whether the effect is mediated by relationship quality with the husband, which is measured on a 1-5 scale. We can again test the sharp null and estimate a lower bound on the fraction affected.
test_sharp_null(df = mother_data,
d = "treat",
m = "relationship_husb",
y = "motherfinancial",
num_Ybins = 5,
method = "CS",
cluster = "uc")$pval
#> [,1]
#> [1,] 0.02838332
Again, we reject the sharp null that all the treatment effect goes through the relationship quality with the husband.
We can also estimate a lower bound on the fraction of always-takers:
lb_frac_affected(df = mother_data,
d = "treat",
m = "relationship_husb",
y = "motherfinancial",
num_Ybins = 5,
at_group = NULL,
allow_min_defiers = TRUE)
#> [1] 0.1002207
Here, the parameter at_group = NULL
asks the function compute the
lower bound on the fraction of always-takers, pooled across different
allow_min_defiers = TRUE
calculates the lower bound allowing for the minimum number of defiers
consistent with the empirical distribution — see footnote 25 of the
paper for details.)
We note that while the method works with multi-valued discrete mediators
(such as our 1-5 score), we generally expect the power of the test to
decrease as one approaches an approximately continuous mediator. See the
discussion in Remarks 2 and 3 of the paper regarding power and
discretization of
Finally, we test the null hypothesis that the treatment effect is
explained by the combination of the two mechanisms. This is done by
passing a vector of variables names for the m
argument.
test_result_both <- test_sharp_null(df = mother_data,
d = "treat",
m = c("relationship_husb",
"grandmother"),
y = "motherfinancial",
num_Ybins = 5,
method = "CS",
cluster = "uc")
test_result_both$pval
#> [,1]
#> [1,] 0.6540865
With a p-value of 0.654, we cannot reject the sharp null that the combination of presence of grandmother and relationship quality with husband fully explain the treatment effect.
Again, we can estimate a lower bound on the fraction of those affected
by treatment, pooled across different
lb_frac_both <- lb_frac_affected(df = mother_data,
d = "treat",
m = c("relationship_husb",
"grandmother"),
y = "motherfinancial",
num_Ybins = 5,
allow_min_defiers = TRUE)
lb_frac_both
#> [1] 0.07251284
We estimate a lower bound of 7 percent, although this does not appear to be statistically significant given the test result above.