pegasus.fgsea NOT work with MOUSE genome
LNGDingj opened this issue · 4 comments
The pegasus.fgsea does NOT work with MOUSE genome.
I got errors while I was running pegasus.fgsea with pseudo-bulk data from pegasus.deseq2, could you help with this? Thanks so much!
Background information:
Cloud Environment: R/Bioconductor: (Python 3.7.12, R4.1.2, Bioconductor 3.1.4, tidyverse 1.3.1)
Pegasus 1.5.1 installed from GitHub
Mouse snRNASeq
My results of pseudo-bulk data(pseudo) from pegasus.deseq2 and plots from pegasus.pseudo.volcano look reasonable
Error messages from calling pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_8120/1566834265.py in <module>
----> 1 pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')
~/.local/lib/python3.7/site-packages/pegasusio/decorators.py in wrapper_timer(*args, **kwargs)
10 def wrapper_timer(*args, **kwargs):
11 start = time.perf_counter()
---> 12 result = func(*args, **kwargs)
13 end = time.perf_counter()
14 message = f"Function '{func.__name__}' finished in {{:.{precision}f}}s.".format(end - start)
~/.local/lib/python3.7/site-packages/pegasus/tools/fgsea.py in fgsea(data, log2fc_key, pathways, de_key, minSize, maxSize, nproc, seed, fgsea_key)
102 )
103 with localconverter(ro.default_converter + pandas2ri.converter):
--> 104 res_df = ro.conversion.rpy2py(unlist(res))
105 res_df.sort_values("padj", inplace=True)
106 data.uns[fgsea_key] = res_df
~/.local/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
197 kwargs[r_k] = v
198 return (super(SignatureTranslatedFunction, self)
--> 199 .__call__(*args, **kwargs))
200
201
~/.local/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
124 new_kwargs[k] = conversion.py2rpy(v)
125 res = super(Function, self).__call__(*new_args, **new_kwargs)
--> 126 res = conversion.rpy2py(res)
127 return res
128
/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
838 '1 positional argument')
839
--> 840 return dispatch(args[0].__class__)(*args, **kw)
841
842 funcname = getattr(func, '__name__', 'singledispatch function')
~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_listvector(obj)
243 def rpy2py_listvector(obj):
244 if 'data.frame' in obj.rclass:
--> 245 res = rpy2py(DataFrame(obj))
246 else:
247 res = numpy2ri.rpy2py(obj)
/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
838 '1 positional argument')
839
--> 840 return dispatch(args[0].__class__)(*args, **kw)
841
842 funcname = getattr(func, '__name__', 'singledispatch function')
~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_dataframe(obj)
252 def rpy2py_dataframe(obj):
253 items = OrderedDict((k, rpy2py(v) if isinstance(v, Sexp) else v)
--> 254 for k, v in obj.items())
255 res = PandasDataFrame.from_dict(items)
256 res.index = obj.rownames
~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in <genexpr>(.0)
252 def rpy2py_dataframe(obj):
253 items = OrderedDict((k, rpy2py(v) if isinstance(v, Sexp) else v)
--> 254 for k, v in obj.items())
255 res = PandasDataFrame.from_dict(items)
256 res.index = obj.rownames
/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
838 '1 positional argument')
839
--> 840 return dispatch(args[0].__class__)(*args, **kw)
841
842 funcname = getattr(func, '__name__', 'singledispatch function')
~/.local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py in rpy2py_dataframe(obj)
254 for k, v in obj.items())
255 res = PandasDataFrame.from_dict(items)
--> 256 res.index = obj.rownames
257 return res
258
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
5498 try:
5499 object.__getattribute__(self, name)
-> 5500 return object.__setattr__(self, name, value)
5501 except AttributeError:
5502 pass
/opt/conda/lib/python3.7/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
763
764 def _set_axis(self, axis: int, labels: Index) -> None:
--> 765 labels = ensure_index(labels)
766 self._mgr.set_axis(axis, labels)
767 self._clear_item_cache()
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
6334 else:
6335
-> 6336 return Index(index_like, copy=copy)
6337
6338
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
492 # other iterable of some kind
493
--> 494 subarr = com.asarray_tuplesafe(data, dtype=np.dtype("object"))
495 return Index(subarr, dtype=dtype, copy=copy, name=name, **kwargs)
496
/opt/conda/lib/python3.7/site-packages/pandas/core/common.py in asarray_tuplesafe(values, dtype)
225
226 if not (isinstance(values, (list, tuple)) or hasattr(values, "__array__")):
--> 227 values = list(values)
228 elif isinstance(values, ABCIndex):
229 # error: Incompatible return value type (got "Union[ExtensionArray, ndarray]",
TypeError: 'NULLType' object is not iterable
Hi @LNGDingj ,
May I check a bunch of things:
- Do you have
fgsea
package installed in your R environment? - Does key
deseq2
exist inpseudo.varm
? - If so, then does key
log2FoldChange
exist inpseudo.varm['deseq2']
? You may construct a pandas data frame by
df = pd.DataFrame(pseudo.varm['deseq2'], index=pseudo.var_names)
and then check if log2FoldChange
exists in df.columns
.
If everything looks good above, could you share the output of the following code:
from pegasus.tools import predefined_pathways, load_signatures_from_file
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
fgsea = importr("fgsea")
pwdict = load_signatures_from_file(predefined_pathways.get('canonical_pathways', 'canonical_pathways'))
pathways_r = ro.ListVector(pwdict)
log2fc = ro.FloatVector(pseudo.varm['deseq2']['log2FoldChange'])
log2fc.names = ro.StrVector(pseudo.var_names)
res = fgsea.fgsea(pathways_r, log2fc, minSize=15, maxSize=500, nproc=0)
print(res)
Sincerely,
Yiming
Hi @yihming ,
Thanks a lot for your helps.
The issue that I reported is very likely caused by Mouse Genetic Nomenclature (The gene symbols for mice have the first letter capitalized followed by lower case letter) which is different from Human Genetic Nomenclature, I attached gene symbols for mouse genome.
mouse_gene_symbols.txt
Here are the information you need,
Do you have fgsea package installed in your R environment?
--- Yes, the installation is without any problems, since all function calls in "Pseudobulk Analysis Tutorial" can be repeated with human genome data 'MantonBM_nonmix_subset.zarr.zip'
Does key deseq2 exist in pseudo.varm?
--- Yes, 'deseq2' is in pseudo.varm
If so, then does key log2FoldChange exist in pseudo.varm['deseq2']? You may construct a pandas data frame by
--- Yes, key log2FoldChange exist in pseudo.varm['deseq2']
print(res)
--- Empty data.table (0 rows and 8 cols): pathway,pval,padj,log2err,ES,NES...
I see.
The preset gene sets that Pegasus provides are from MSigDB, and all MSigDB gene sets consist of human gene symbols.
To make GSEA work for your mouse data, you may find this discussion helpful.
Besides, it seems that MSigDB provides some mouse gene sets here. However, it's still in DRAFT status, and we haven't tested them.
If you have gene set gmt
file yourself, you can directly use it by specifying its file path in pathways
parameter of fgsea
function.
Otherwise, if you really want to compare your data with human genes, you may make the gene names in your data (i.e. pseudo.var_names
) all capitalized to "cheat" the system.
Sincerely,
Yiming
Thanks a lot for all your helps!