raphaelvallat/pingouin

Partial correlation: checks whether covariate is identical to x or y

raphaelvallat opened this issue · 0 comments

Discussed in #371

Originally posted by m-guggenmos August 8, 2023
[I realized this issue is more subtle / less critical during writing this post, but perhaps it is useful anyway..]

I encountered the following situation (this is a MRE from some more complex code):

import numpy as np
import pingouin as pg
import pandas as pd
np.random.seed(0)

data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
df = pd.DataFrame(dict(x=data[:, 0], y=data[:, 1]))
df['z'] = df['y']

# I expected this correlation to be nan (or maybe 0):
pg.partial_corr(df, 'x', 'y', 'z')
# ..however:
#             n         r         CI95%         p-val
# pearson  1000  0.497099  [0.45, 0.54]  1.803982e-63
# and:
pg.partial_corr(df, 'x', 'y')
#            n         r         CI95%         p-val
# pearson  1000  0.497099  [0.45, 0.54]  1.564525e-63

The use case here is (sort of a control) analysis, in which I use pg.partial_corr to correct a correlation matrix for one of it's factors (let's say factors z). I would have expected that all partial correlations of z when controlling for z should be nan or possibly 0, but in the above code, the result of pg.partial_corr(df, 'x', 'y', 'z') where df['y'] = df['z'] is nearly identical to pg.partial_corr(df, 'x', 'y'), as if z is not partialed out at all.

With some testing I realized that had I programmed it as pg.partial_corr(df, 'x', 'z', 'z') I would have received the error message AssertionError: y and covar must be independent. So long story short, I wonder whether instead of only asserting/testing for x != covar and y != covar one could

  1. also test something like not np.allclose(data['y'], data['covar']) and not np.allclose(data['x'], data['covar']) and
  2. if one of these conditions is met, instead of raising an assertion error, return a nan correlation with a warning about non-independence.