difference-in-differences estimation and inference for Python
For the following use cases
- Balanced panels, unbalanced panels & repeated cross-section
- Two + Multiple time periods
- Fixed + Staggered treatment timing
- Binary + Multi-Valued treatment
- Heterogeneous treatment effects & triple difference
- One + Multiple treatments per entity
see the Documentation for more details.
The latest release can be installed using pip
pip install differences
requires Python >= 3.8
the ATTgt class implements the estimation procedures suggested by Callaway and Sant'Anna (2021) , Sant'Anna and Zhao (2020) and the multi-valued treatment case discussed in Callaway, Goodman-Bacon & Sant'Anna (2021)
from differences import ATTgt, simulate_data
df = simulate_data()
att_gt = ATTgt(data=df, cohort_name='cohort')
att_gt.fit(formula='y')
att_gt.aggregate('event')
differences ATTgt benefitted substantially from the original authors' R packages: Callaway & Sant'Anna's did and Sant'Anna and Zhao's DRDID
NOTE: Important note on performance ! Currently, the ATTgt class allows users to pass string entity identifiers, as in the example with
df = simulate_data()
above, where the first index containing the entity identifiers is a string datatype. Note that the performance of the ATT computation (when calling.fit()
) would improve greatly if you cast the entities to integers before initializing ATTgt. You can easily do that just by using pandas category codes.
from differences import TWFE, simulate_data
df = simulate_data()
twfe = TWFE(data=df, cohort_name='cohort')
twfe.fit(formula='y')