ronikobrosly/causal-curve

Remove confounders influence

Closed this issue · 5 comments

Hi,

Many thanks for developing such a fantastic and needy tool.
But I encounter some problems when deal with confounders.
Just give a snippet to illustrate.

import pandas as pd
from causal_curve import GPS_Regressor, TMLE_Regressor

df = pd.DataFrame()
df['X_1'] = np.random.rand(200) * 10
df['Treatment'] = np.random.rand(200) * 10
df['Outcome'] = df['X_1']
reg = TMLE_Regressor()
reg.fit(T = df['Treatment'], X = df[['X_1']], y = df['Outcome'])
gps_results = reg.calculate_CDRC(ci = 0.95)
plt.plot(gps_results['Treatment'],gps_results['Causal_Dose_Response'],c='b')
plt.plot(gps_results['Treatment'],gps_results['Lower_CI'],c='g',ls='--')
plt.plot(gps_results['Treatment'],gps_results['Upper_CI'],c='g',ls='--')

The outcome is totally decided by the confounder X1, and the CDRC shows
demo

Sorry I cannot understand the graph correctly, in my opinion, the influence of treament show be zero in this case.

Please help me figure out what is going on.

Hi @AlexMa011 , thanks so much for your interest! This is a great question. If I understand your question, what is going on is that there is zero correlation between the treatment and covariate X_1 and perfect correlation between the covariate X_1 and the outcome. In such a case, there would be no confounding occurring. Confounding can only occur when there is a correlation between the treatment and covariate, and the covariate and the outcome.

Because there is no confounding occurring, the causal curve is going to look like the raw, bivariate relationship between the treatment and the outcome. I put together some code below to show this. As you can see, the orange line is pretty close to the causal curve. Also, if you smooth the causal curve further, the slope is approximately zero (a horizontal line)

One issue here though is that the causal curves look a little too sensitive and need to be smoothed a little more. This can be done by changing some of the parameters in the GPS_Regressor and TMLE_Regressor tools.

Does this answer your question?

import matplotlib.pyplot as plt
import numpy as np 
import pandas as pd
import statsmodels.api as sm

from causal_curve import GPS_Regressor, TMLE_Regressor


# Make data
df = pd.DataFrame()
df['X_1'] = np.random.rand(200) * 10
df['Treatment'] = np.random.rand(200) * 10
df['Outcome'] = df['X_1']

# Fit models
gps = GPS_Regressor()
gps.fit(T = df['Treatment'], X = df[['X_1']], y = df['Outcome'])
gps_results = gps.calculate_CDRC(ci = 0.95)

tmle = TMLE_Regressor()
tmle.fit(T = df['Treatment'], X = df[['X_1']], y = df['Outcome'])
tmle_results = tmle.calculate_CDRC(ci = 0.95)

# Creat LOESS-smoothed outcome (no covariate effect)
df2 = df.sort_values('Treatment', ascending = True, inplace = False)
lowess = sm.nonparametric.lowess
z = lowess(df2['Outcome'], df2['Treatment'], frac = 0.2)
treatment, smoothed_outcome = z[:,0], z[:,1] 

# Plot it all
plt.clf()
fig, axs = plt.subplots(2)
fig.suptitle('GPS and TMLE causal curves')

# GPS results
axs[0].plot(gps_results['Treatment'], gps_results['Causal_Dose_Response'], c='b') # causal-curve
axs[0].plot(treatment, smoothed_outcome, c='orange') # raw treatment and LOESS-smoothed outcome only
axs[0].plot(gps_results['Treatment'], gps_results['Lower_CI'], c='g', ls='--') # causal-curve 95% lower
axs[0].plot(gps_results['Treatment'], gps_results['Upper_CI'], c='g', ls='--') # causal-curve 95% upper

# TMLE results
axs[1].plot(tmle_results['Treatment'], tmle_results['Causal_Dose_Response'], c='b') # causal-curve
axs[1].plot(treatment, smoothed_outcome, c='orange') # raw treatment and LOESS-smoothed outcome only
axs[1].plot(tmle_results['Treatment'], tmle_results['Lower_CI'], c='g', ls='--') # causal-curve 95% lower
axs[1].plot(tmle_results['Treatment'], tmle_results['Upper_CI'], c='g', ls='--') # causal-curve 95% upper

plt.show()

Screen Shot 2021-01-14 at 10 33 47 AM

Thanks for quick response. You are totally correct about confounding, this case is not related to any confounding. Now I understand that the annoying randomness is the influence factor in this case.

Thanks again for developing such a great casual inference tool to deal with continuous treatment.
And I encountered a new confusing case.

df = pd.DataFrame()
df['X_1'] = np.arange(0,100,0.5)
df['Treatment'] = df['X_1'] + np.random.rand(200) * 0.5
df['Outcome'] = df['X_1']
reg = GPS_Regressor()
reg.fit(T = df['Treatment'], X = df[['X_1']], y = df['Outcome'])
gps_results = reg.calculate_CDRC(ci = 0.95)
plt.plot(gps_results['Treatment'],gps_results['Causal_Dose_Response'],c='b')
plt.plot(gps_results['Treatment'],gps_results['Lower_CI'],c='g',ls='--')
plt.plot(gps_results['Treatment'],gps_results['Upper_CI'],c='g',ls='--')

I think this is a case of confounding, and the result shows like that:
demo2

The result seems good. But I think the treament is working just because of selection bias. Do I have a method to find out the "true effect" of the treatment?

In this case @AlexMa011 the treatment is perfectly correlated with the covariate X_1 and the outcome. The covariate is also perfectly correlated with the outcome. Because of this, it is impossible for any model to separate treatment from covariate. A 45 degree line between [0, 100] like this is expected.

Hopefully these help @AlexMa011 ! Is it okay if I close the issue?

Thank you very much!