lilab-bcb/pegasus

pegasus.fgsea NOT work with MOUSE genome

LNGDingj opened this issue · 4 comments

Hello, @bli25 @yihming

The pegasus.fgsea does NOT work with MOUSE genome.
I got errors while I was running pegasus.fgsea with pseudo-bulk data from pegasus.deseq2, could you help with this? Thanks so much!

Background information:
Cloud Environment: R/Bioconductor: (Python 3.7.12, R4.1.2, Bioconductor 3.1.4, tidyverse 1.3.1)
Pegasus 1.5.1 installed from GitHub
Mouse snRNASeq
My results of pseudo-bulk data(pseudo) from pegasus.deseq2 and plots from pegasus.pseudo.volcano look reasonable

Error messages from calling pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_8120/1566834265.py in <module>
----> 1 pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')

~/.local/lib/python3.7/site-packages/pegasusio/decorators.py in wrapper_timer(*args, **kwargs)
     10                 def wrapper_timer(*args, **kwargs):
     11                         start = time.perf_counter()
---> 12                         result = func(*args, **kwargs)
     13                         end = time.perf_counter()
     14                         message = f"Function '{func.__name__}' finished in {{:.{precision}f}}s.".format(end - start)

~/.local/lib/python3.7/site-packages/pegasus/tools/fgsea.py in fgsea(data, log2fc_key, pathways, de_key, minSize, maxSize, nproc, seed, fgsea_key)
    102     )
    103     with localconverter(ro.default_converter + pandas2ri.converter):
--> 104         res_df = ro.conversion.rpy2py(unlist(res))
    105     res_df.sort_values("padj", inplace=True)
    106     data.uns[fgsea_key] = res_df

~/.local/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    197                 kwargs[r_k] = v
    198         return (super(SignatureTranslatedFunction, self)
--> 199                 .__call__(*args, **kwargs))
    200 
    201 

~/.local/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    124                 new_kwargs[k] = conversion.py2rpy(v)
    125         res = super(Function, self).__call__(*new_args, **new_kwargs)
--> 126         res = conversion.rpy2py(res)
    127         return res
    128 

/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_listvector(obj)
    243 def rpy2py_listvector(obj):
    244     if 'data.frame' in obj.rclass:
--> 245         res = rpy2py(DataFrame(obj))
    246     else:
    247         res = numpy2ri.rpy2py(obj)

/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_dataframe(obj)
    252 def rpy2py_dataframe(obj):
    253     items = OrderedDict((k, rpy2py(v) if isinstance(v, Sexp) else v)
--> 254                         for k, v in obj.items())
    255     res = PandasDataFrame.from_dict(items)
    256     res.index = obj.rownames

~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in <genexpr>(.0)
    252 def rpy2py_dataframe(obj):
    253     items = OrderedDict((k, rpy2py(v) if isinstance(v, Sexp) else v)
--> 254                         for k, v in obj.items())
    255     res = PandasDataFrame.from_dict(items)
    256     res.index = obj.rownames

/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_dataframe(obj)
    254                         for k, v in obj.items())
    255     res = PandasDataFrame.from_dict(items)
--> 256     res.index = obj.rownames
    257     return res
    258 

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5498         try:
   5499             object.__getattribute__(self, name)
-> 5500             return object.__setattr__(self, name, value)
   5501         except AttributeError:
   5502             pass

/opt/conda/lib/python3.7/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
    763 
    764     def _set_axis(self, axis: int, labels: Index) -> None:
--> 765         labels = ensure_index(labels)
    766         self._mgr.set_axis(axis, labels)
    767         self._clear_item_cache()

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
   6334     else:
   6335 
-> 6336         return Index(index_like, copy=copy)
   6337 
   6338 

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
    492             # other iterable of some kind
    493 
--> 494             subarr = com.asarray_tuplesafe(data, dtype=np.dtype("object"))
    495             return Index(subarr, dtype=dtype, copy=copy, name=name, **kwargs)
    496 

/opt/conda/lib/python3.7/site-packages/pandas/core/common.py in asarray_tuplesafe(values, dtype)
    225 
    226     if not (isinstance(values, (list, tuple)) or hasattr(values, "__array__")):
--> 227         values = list(values)
    228     elif isinstance(values, ABCIndex):
    229         # error: Incompatible return value type (got "Union[ExtensionArray, ndarray]",

TypeError: 'NULLType' object is not iterable

Hi @LNGDingj ,

May I check a bunch of things:

  • Do you have fgsea package installed in your R environment?
  • Does key deseq2 exist in pseudo.varm?
  • If so, then does key log2FoldChange exist in pseudo.varm['deseq2']? You may construct a pandas data frame by
df = pd.DataFrame(pseudo.varm['deseq2'], index=pseudo.var_names)

and then check if log2FoldChange exists in df.columns.

If everything looks good above, could you share the output of the following code:

from pegasus.tools import predefined_pathways, load_signatures_from_file
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
fgsea = importr("fgsea")
pwdict = load_signatures_from_file(predefined_pathways.get('canonical_pathways', 'canonical_pathways'))
pathways_r = ro.ListVector(pwdict)
log2fc = ro.FloatVector(pseudo.varm['deseq2']['log2FoldChange'])
log2fc.names = ro.StrVector(pseudo.var_names)
res = fgsea.fgsea(pathways_r, log2fc, minSize=15, maxSize=500, nproc=0)
print(res)

Sincerely,
Yiming

Hi @yihming ,

Thanks a lot for your helps.

The issue that I reported is very likely caused by Mouse Genetic Nomenclature (The gene symbols for mice have the first letter capitalized followed by lower case letter) which is different from Human Genetic Nomenclature, I attached gene symbols for mouse genome.
mouse_gene_symbols.txt

Here are the information you need,
Do you have fgsea package installed in your R environment?
--- Yes, the installation is without any problems, since all function calls in "Pseudobulk Analysis Tutorial" can be repeated with human genome data 'MantonBM_nonmix_subset.zarr.zip'

Does key deseq2 exist in pseudo.varm?
--- Yes, 'deseq2' is in pseudo.varm

If so, then does key log2FoldChange exist in pseudo.varm['deseq2']? You may construct a pandas data frame by
--- Yes, key log2FoldChange exist in pseudo.varm['deseq2']

print(res)
--- Empty data.table (0 rows and 8 cols): pathway,pval,padj,log2err,ES,NES...

I see.

The preset gene sets that Pegasus provides are from MSigDB, and all MSigDB gene sets consist of human gene symbols.

To make GSEA work for your mouse data, you may find this discussion helpful.

Besides, it seems that MSigDB provides some mouse gene sets here. However, it's still in DRAFT status, and we haven't tested them.

If you have gene set gmt file yourself, you can directly use it by specifying its file path in pathways parameter of fgsea function.

Otherwise, if you really want to compare your data with human genes, you may make the gene names in your data (i.e. pseudo.var_names) all capitalized to "cheat" the system.

Sincerely,
Yiming

Thanks a lot for all your helps!