theislab/anndata2ri

Conversion to R fails when adata.var is not empty

climouse opened this issue · 2 comments

Thanks for the great work, it's awesome to be able to go back and forth between scanpy and seurat!

I am having some issues when the anndata object contains a non empty .var slot.
For example:

data #my anndata object

#AnnData object with n_obs × n_vars = 3818 × 5127 
#    obs: 'gene_symbols', 'type', 'chr', 'mybatch', 'celltype', 'n_genes', 'n_counts', 'n_targets'
#    var: 'gene_symbols', 'type', 'chr', 'n_cells', 'n_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#    uns: 'log1p'

The conversion of this anndata object throws an error

%%R -i data
print(data)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-101-8763aa13d87e> in <module>()
----> 1 get_ipython().run_cell_magic('R', '-i data', 'print(data)')

/share/software/user/open/py-jupyter/1.0.0_py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2101             magic_arg_s = self.var_expand(line, stack_depth)
   2102             with self.builtin_trap:
-> 2103                 result = fn(magic_arg_s, cell)
   2104             return result
   2105 

<decorator-gen-129> in R(self, line, cell, local_ns)

/share/software/user/open/py-jupyter/1.0.0_py36/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/rpy2/ipython/rmagic.py in R(self, line, cell, local_ns)
    721                         raise NameError("name '%s' is not defined" % input)
    722                 with localconverter(converter) as cv:
--> 723                     ro.r.assign(input, val)
    724 
    725         tmpd = self.setup_graphics(args)

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    190                 kwargs[r_k] = v
    191         return (super(SignatureTranslatedFunction, self)
--> 192                 .__call__(*args, **kwargs))
    193 
    194 

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    111 
    112     def __call__(self, *args, **kwargs):
--> 113         new_args = [conversion.py2rpy(a) for a in args]
    114         new_kwargs = {}
    115         for k, v in kwargs.items():

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/rpy2/robjects/functions.py in <listcomp>(.0)
    111 
    112     def __call__(self, *args, **kwargs):
--> 113         new_args = [conversion.py2rpy(a) for a in args]
    114         new_kwargs = {}
    115         for k, v in kwargs.items():

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/anndata2ri/py2r.py in py2rpy_anndata(obj)
     57         assays = ListVector({**x, **layers})
     58 
---> 59         row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}
     60         if check_no_dupes(obj.var_names, "var_names"):
     61             row_args["row.names"] = pandas2ri.py2rpy(obj.var_names)

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/anndata2ri/py2r.py in <dictcomp>(.0)
     57         assays = ListVector({**x, **layers})
     58 
---> 59         row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}
     60         if check_no_dupes(obj.var_names, "var_names"):
     61             row_args["row.names"] = pandas2ri.py2rpy(obj.var_names)

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

/share/PI/astraigh/miniconda3/envs/scanpy/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in py2rpy_pandasseries(obj)
    124                 continue
    125             if type(x) is not homogeneous_type:
--> 126                 raise ValueError('Series can only be of one type, or None.')
    127         # TODO: Could this be merged with obj.type.name == 'O' case above ?
    128         res = {

ValueError: Series can only be of one type, or None.`

Now if I remove the .var slot in that same anndata object the conversion works

data_novars=annd.AnnData(X=data.X, obs=data.obs, var=None, uns=data.uns, raw=data.raw)
data_novars
#AnnData object with n_obs × n_vars = 3818 × 5127 
#    obs: 'gene_symbols', 'type', 'chr', 'mybatch', 'celltype', 'n_genes', 'n_counts', 'n_targets'
#    uns: 'log1p'
%%R -i data_novars
print(data_novars)

Returns this as expected

class: SingleCellExperiment 
dim: 5127 3818 
metadata(1): log1p
assays(1): X
rownames(5127): 0 1 ... 5125 5126
rowData names(0):
colnames(3818): ENSG00000251562 ENSG00000202198 ... ENSG00000240710
  ENSG00000230928
colData names(8): gene_symbols type ... n_counts n_targets
reducedDimNames(0):
spikeNames(0):

I am using

scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.0 python-igraph==0.7.1 louvain==0.6.1

And on the R side:

Seurat = 3.0.2

Hi! Happy you enjoy it!

I assume it’s a categorical column causing the error and you don’t have anndata2ri v1.0.2. If I’m right, this is a duplicate of #39 and you can fix it by updating to 1.0.2.

If not, please continue:


“Non-empty” isn’t the issue: as you see, .obs converts fine, and it uses the exact same conversion function. It’s one or more of the .var columns, and you have to find out which and why.

I use rpy2’s pandas2ri to convert obs’ and var’s columns. Therefore I assume that unless it has to do with the rpy2 converters I activate, it’s an rpy2 issue or some column type that’s weird enough to not be supported.

Which .var column(s) cause(s) the error? What dtype does it / do they have? Is it / are they categorical?

So I’m going to assume you had an old version and this is resolved. If not, please comment here and tell me what I asked for in the previous comment.