This homework will focus on hierarchical models using the 8 schools dataset. The 8 schools dataset is described in BDA3 as
A study was performed for the Educational Testing Service to analyze the effects of special coaching sessions on test scores. Separate randomized experiments were performed to estimate the effects of coaching programs for the SAT-V (verbal) in each of eight high schools. The outcome variable in each study was the score on a special adminstration of the SAT-V. The test scores can vary between 200 and 800, wich mean about 500 and standard deviation of 100.... The scores are reported as estimated coaching effects and standard deviations at each school.
Using stan code provided below
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effect (school j)
real<lower=0> sigma[J]; // std err of effect estimate (school j)
}
parameters {
real mu;
real theta[J];
real<lower=0> tau;
}
model {
theta ~ normal(mu, tau);
y ~ normal(theta,sigma);
}
and the following R code (you'll need to provide a link to the stan file above)
library(rstan)
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
fit <- stan(file = '8schools.stan', data = schools_dat)
print(fit)
plot(fit)
to fit the model.
What do mu, theta, and tau represent in the Stan code?
Summarize the output generated by the Stan code.
Why do the y-values differ from the theta values.
Run the following code to lauch shinystan
, which is an interactive way to view MCMC results.
library(shinystan)
launch_shinystan_demo()
Explore the interactive GUI, which is the same analysis you conducted in part 1 (and can be called for any Stan analysis).
Describe the features in the estimate tab.
Describe the features in the explore tab.
Read and summarize the blog post Everything I need to know about Bayesian statistics, I learned in eight schools.