How to plot pg.partial_corr as scatterplot?
Closed this issue · 3 comments
I am using pg.partial_corr
to calculate the partial spearman rank correlation between my two variables of interest while controlling for the influence of three covariates. This makes my problem identical to to the second example in the documentation ("2. Spearman partial correlation with several covariates"):
# Partial correlation of x and y controlling for cv1, cv2 and cv3
pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
method='spearman').round(3)
n r CI95% p-val
spearman 30 0.521 [0.18, 0.75] 0.005
I wonder how I would plot this as a scatterplot? One idea would be to convert the dataframe to ranks, and then plot the residuals against each other? But I am not sure if this would be really a good visualization of what is mathematically happening? For example, one could do:
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# convert to ranks
df_rank = df.rank()
# Regress X and Y on confounders cv1,cv2,cv3
x_resid = smf.ols(formula=f"x ~ cv1 + cv2 + cv3",data=df_rank).fit().resid
y_resid = smf.ols(formula=f"y ~ cv1 + cv2 + cv3",data=df_rank).fit().resid
residual_df = pd.DataFrame({'X_resid': x_resid, 'Y_resid': y_resid})
# Plot the partial correlation using seaborn
plt.figure()
sns.regplot(x='X_resid', y='Y_resid', data=residual_df, ci=None)
plt.title("Partial Correlation Plot")
plt.xlabel("Residuals of X")
plt.ylabel("Residuals of Y")
plt.show()
What would be your recommendation on how to plot this?
Hey @JohannesWiesner,
I would probably just plot the standard Pearson partial correlation and annotate the plot with the (partial) Spearman rho
sns.regplot(x='x', y='y', data=df, x_partial=['cv1', 'cv2', 'cv3'], y_partial=['cv1', 'cv2', 'cv3'])
@raphaelvallat : Thanks for the quick answer!
I seem to end up here and here:
mwaskom/seaborn#2675
mwaskom/seaborn#458
using your approach, but should be not of your concern, since this is more of a software issue here :)
Ah right, I thought you could specify multiples ones in sns.regplot
🤦 ... Anyway, you can do the partial regression plot yourself, in which case I would just recommend using the raw data and not ranks.
Closing this issue but please feel free to reopen!