LOST-STATS/lost-stats.github.io

Event study in Stata

Opened this issue · 1 comments

First, thanks a lot for putting this out for learners like us. I was wondering if i can use the following for repeated cross sectional data as well?

Code copy and pasted from the diff-in-diff event study section************

use "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.dta", clear

  • create the lag/lead for treated states

  • fill in control obs with 0

  • This allows for the interaction between treat and time_to_treat to occur for each state.

  • Otherwise, there may be some NAs and the estimations will be off.
    g time_to_treat = year - _nfd
    replace time_to_treat = 0 if missing(_nfd)

  • this will determine the difference

  • btw controls and treated states
    g treat = !missing(_nfd)

  • Stata won't allow factors with negative values, so let's shift

  • time-to-treat to start at 0, keeping track of where the true -1 is
    summ time_to_treat
    g shifted_ttt = time_to_treat - r(min)
    summ shifted_ttt if time_to_treat == -1
    local true_neg1 = r(mean)

  • Regress on our interaction terms with FEs for group and year,

  • clustering at the group (state) level

  • use ib# to specify our reference group
    reghdfe asmrs ib`true_neg1'.shifted_ttt pcinc asmrh cases, a(stfips year) vce(cluster stfips)

  • Pull out the coefficients and SEs
    g coef = .
    g se = .
    levelsof shifted_ttt, l(times)
    foreach t in times' { replace coef = _b[t'.shifted_ttt] if shifted_ttt == t' replace se = _se[t'.shifted_ttt] if shifted_ttt == `t'
    }

  • Make confidence intervals
    g ci_top = coef+1.96se
    g ci_bottom = coef - 1.96
    se

  • Limit ourselves to one observation per quarter

  • now switch back to time_to_treat to get original timing
    keep time_to_treat coef se ci_*
    duplicates drop

sort time_to_treat

  • Create connected scatterplot of coefficients
  • with CIs included with rcap
  • and a line at 0 both horizontally and vertically
    summ ci_top
    local top_range = r(max)
    summ ci_bottom
    local bottom_range = r(min)

twoway (sc coef time_to_treat, connect(line)) ///
(rcap ci_top ci_bottom time_to_treat) ///
(function y = 0, range(time_to_treat)) ///
(function y = 0, range(bottom_range' top_range') horiz), ///
xtitle("Time to Treatment") caption("95% Confidence Intervals Shown")

Yep, should work fine for DID performed with repeated cross section data.