Bug on Differential Expression Analysis for CCC & Downstream Signalling Networks Vignette
maximelepetit opened this issue · 3 comments
Hello,
Thank you very much for maintaining and improving this package, it is extremely useful and interesting for our research.
I want to report a bug when running the differential analysis vignette.
At this steps after running deseq2 on pseudo-bulk profiles:
# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby}).set_index('index')
dea_df.head()
I have this error :
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_1947289/4220832029.py in ?()
1 # concat results across cell types
2 dea_df = pd.concat(dea_results)
----> 3 dea_df = dea_df.reset_index().rename(columns={'level_0': groupby}).set_index('index')
4 dea_df.head()
~/miniconda3/envs/liana-env/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, keys, drop, append, inplace, verify_integrity)
6102 if not found:
6103 missing.append(col)
6104
6105 if missing:
-> 6106 raise KeyError(f"None of {missing} are in the columns")
6107
6108 if inplace:
6109 frame = self
KeyError: "None of ['index'] are in the columns"
This error can be fix like this :
# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby,'level_1':'index'}).set_index('index')
dea_df.head()
The results output looks like this :
I noticed that NaN can be introduced on dea_df.
When remove them with :
# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby,'level_1':'index'}).set_index('index').dropna()
dea_df
It represent arounds 4000 rows.
Maxime
Hi @maximelepetit,
Thanks for the issue and for using liana. I will update this line the tutorial :)
Though, I'm really not sure why PyDESeq2 returns NaNs in those cases...
I have opened an issue for this. Perhaps, I'm missing something but best to double check:
owkin/PyDESeq2#291
I would not do .dropna()
in the tutorial by default, as it might accidentally hide some major issues.
Hi @maximelepetit,
As the maintainer of PyDeseq2 responded, there might be a couple of reasons for the NaN: the cooks_filter
from DESEQ2, and the other potential one was a float instead of int issue.
I will close this issue as it seems to be clarified or resolved by the recent PR in PyDESeq2.
Hi @dbdimitrov ,
Thanks for the reply,
1 - I upgrade PyDESeq2, now i'm using v0.4.10 (the latest release).
I notice that this version intoduced more NaN in padj column compare to my previous post:
len(dea_df[dea_df.isna().any(axis=1)])
9403
When i check distribution of pvalue, i noticed that most of pvalues are non significant eg > 0.05 (before adjustement) :
dea_na = dea_df[dea_df.isna().any(axis=1)]
len(dea_na[dea_na['pvalue'] > 0.05])
8733
But some of them have pvalues significants < 0.05 or even more significant < 0.005 :
len(dea_na[dea_na['pvalue'] < 0.05])
670
len(dea_na[dea_na['pvalue'] < 0.005])
100
Is it appropriate to remove all these lines?
2 - Without removing these line i ran the following lines :
adata = adata[adata.obs[condition_key]=='Hx'].copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
lr_res = li.multi.df_to_lr(adata,
dea_df=dea_df,
resource_name='mouseconsensus', # NOTE: uses HUMAN gene symbols!
expr_prop=0.1, # calculated for adata as passed - used to filter interactions
groupby=groupby,
stat_keys=['stat', 'pvalue', 'padj'],
use_raw=False,
complex_col='stat', # NOTE: we use the Wald Stat to deal with complexes
verbose=True,
return_all_lrs=False,
)
First i want to focus on interaction that both the ligand and receptor are deregulated in the same direction with padj < 0.05:
lr_res_signif = lr_res[lr_res['interaction_padj'] < 0.05]
lr_res_signif
I noticed that in the case when i kept NaN in dea_df, NaN are also introduced in lr_res in column receptor_padj or ligand_padj .
This interactions are pertinent ? i'm a bit confused ...
Maxime