bashtage/linearmodels

absorbingLS: debiasing for with robust standard errors not occurring though debiased=True

blakemas opened this issue · 2 comments

Hi there, love this package and this function. I came across a small issue with attempting to compute heterskedaticity robust standard errors for absorbingLS.fit as detailed in the fit docs and in the docs for linearmodels.iv.covariance.HeteroskedasticCovariance. In particular, passing the KWarg debiased = True does not alter the standard errors. Please see the below MWE. As a baseline, I have compared the output against statsmodels.api.OLS using the HC1_se standard error which debiases but is otherwise the same.

import numpy as np 
import pandas as pd
import linearmodels as lm
from statsmodels.api import OLS

print('Linear models version: {}'.format(lm.__version__))

### Make some toy data:
n = 1000
np.random.seed(100)
X = 10*np.random.randn(n)
# make some categorical fixed effects
n_cats = 50     # number of different categories
cats = np.array([i % n_cats for i in range(n)])
Y = X + cats + np.random.randn(n)


### solve using absorbing LS with robust standard errors and debiasing
cats = pd.Series(cats).astype('category').to_frame()    # change to category type
model = lm.iv.absorbing.AbsorbingLS(Y, X, absorb=cats)
fit1 = model.fit(cov_type='robust')
print('Standard errors for robust SE without debiasing:')
print(fit1.std_errors['exog'])
print('debiased flag: {}'.format(fit1.debiased))
print(' ')

### Attempt to debias by passing KWArg in .fit
model = lm.iv.absorbing.AbsorbingLS(Y, X, absorb=cats)
fit2 = model.fit(cov_type='robust', **{'debiased': True})
print('Standard errors for robust SE with debiasing:')
print(fit2.std_errors['exog'])
print('debiased flag: {}'.format(fit1.debiased))
print(' ')

### manually debias the standard errors by scaling by 
# sqrt(nobs / (nobs - dof))
scale = np.sqrt(fit1.nobs / (fit1.nobs - fit1.df_model))
print('Standard error after manually debiasing:')
print(fit1.std_errors['exog'] * scale)
print(' ')

### Comparison to Statsmodels OLS with pandas get_dummies
Z = pd.get_dummies(cats)
Z = np.hstack([np.expand_dims(X, axis=1), Z])
model = OLS(Y, Z)
fit3 = model.fit(cov_type='HC1')
print('Standard errors from Statsmodels HC1_se:')
print(fit3.bse[0])  # X is the first index in the parameters

which produces the following output on my local machine:

Linear models version: 4.27
Standard errors for robust SE without debiasing:
0.0032257636733263708
debiased flag: False

Standard errors for robust SE with debiasing:
0.0032273777658333455
debiased flag: False

Standard error after manually debiasing:
0.0033113069497783198

Standard errors from Statsmodels HC1_se:
0.0033113069497783224

In particular, we see that by manually adjusting the standard error output by absorbingLS matches the output of statsmodels. However, the outputs when debiased is both true and false are equal and do not match. Thank you again for such an awesome tool!

The only debiasing is w.r.t. the number of non-absorbed variables. The code doesn't try to count the number of absorbed regressors and in any complicated model the absorbed regressor space will be singular -- the estimation method doesn't care about the singularity and only ensures that the effect-pruged X and effect-pruged Y are orthogonal to the absorbed regressors.

This is why there is a small change in the SE with and without, rather than a larger one like you get in statsmodels.

Gotcha, thank you so much for clarifying!