Partial correlation: checks whether covariate is identical to x or y
raphaelvallat opened this issue · 0 comments
Discussed in #371
Originally posted by m-guggenmos August 8, 2023
[I realized this issue is more subtle / less critical during writing this post, but perhaps it is useful anyway..]
I encountered the following situation (this is a MRE from some more complex code):
import numpy as np
import pingouin as pg
import pandas as pd
np.random.seed(0)
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
df = pd.DataFrame(dict(x=data[:, 0], y=data[:, 1]))
df['z'] = df['y']
# I expected this correlation to be nan (or maybe 0):
pg.partial_corr(df, 'x', 'y', 'z')
# ..however:
# n r CI95% p-val
# pearson 1000 0.497099 [0.45, 0.54] 1.803982e-63
# and:
pg.partial_corr(df, 'x', 'y')
# n r CI95% p-val
# pearson 1000 0.497099 [0.45, 0.54] 1.564525e-63
The use case here is (sort of a control) analysis, in which I use pg.partial_corr
to correct a correlation matrix for one of it's factors (let's say factors z
). I would have expected that all partial correlations of z
when controlling for z
should be nan or possibly 0, but in the above code, the result of pg.partial_corr(df, 'x', 'y', 'z')
where df['y'] = df['z']
is nearly identical to pg.partial_corr(df, 'x', 'y')
, as if z
is not partialed out at all.
With some testing I realized that had I programmed it as pg.partial_corr(df, 'x', 'z', 'z') I would have received the error message AssertionError: y and covar must be independent
. So long story short, I wonder whether instead of only asserting/testing for x != covar
and y != covar
one could
- also test something like
not np.allclose(data['y'], data['covar'])
andnot np.allclose(data['x'], data['covar'])
and - if one of these conditions is met, instead of raising an assertion error, return a
nan
correlation with a warning about non-independence.