theislab/scarches

AssertionError: the erro occurs in preparing query feature naming (gene symbols) does not match the reference model feature naming (ensembl IDs )

Opened this issue · 0 comments

Thank you for the great jobs to the community!

Recently, I followed the example code presented in https://github.com/theislab/scarches/blob/master/notebooks/hlca_map_classify.ipynb to run with my own query Anndata Object. The code works well with the query data you offered but failed with mine. It threw an erro when I run with sum_by function:

Sum any columns with identical gene IDs that have resulted from the mapping. Here we define a short function to do that easily.

def sum_by(adata: ad.AnnData, col: str) -> ad.AnnData:
adata.strings_to_categoricals()
assert pd.api.types.is_categorical_dtype(adata.obs[col])

 cat = adata.obs[col].values
 indicator = sparse.coo_matrix(
     (np.broadcast_to(True, adata.n_obs), (cat.codes, np.arange(adata.n_obs))),
     shape=(len(cat.categories), adata.n_obs),
 )

 return ad.AnnData(
     indicator @ adata.X, var=adata.var, obs=pd.DataFrame(index=cat.categories)
 )

adata_query_unprep = sum_by(adata_query_unprep.transpose(), col="gene_ids").transpose()

AssertionError Traceback (most recent call last)
/tmp/ipykernel_375460/4109603730.py in
----> 1 adata_query_unprep = sum_by(adata_query_unprep.transpose(), col="gene_ids").transpose()

/tmp/ipykernel_375460/1296360838.py in sum_by(adata, col)
1 def sum_by(adata: ad.AnnData, col: str) -> ad.AnnData:
2 adata.strings_to_categoricals()
----> 3 assert pd.api.types.is_categorical_dtype(adata.obs[col])
4
5 cat = adata.obs[col].values

AssertionError:


The shape of my query Anndata Object (adata_query_unprep) is:

AnnData object with n_obs × n_vars = 902735 × 1915
obs: 'dataset'
var: 'gene_names', 'gene_ids'

adata_query_unprep.var.head(5)
gene_names gene_ids
ENSG00000188290 HES4 ENSG00000188290
ENSG00000187608 ISG15 ENSG00000187608
ENSG00000162571 TTLL10 ENSG00000162571
ENSG00000186891 TNFRSF18 ENSG00000186891
ENSG00000186827 TNFRSF4 ENSG00000186827

The pip list is:
Package Version


absl-py 1.4.0
aiohttp 3.8.4
aiosignal 1.3.1
anndata 0.9.1
anyio 3.7.1
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
async-timeout 4.0.2
attrs 23.1.0
backcall 0.2.0
backoff 2.2.1
beautifulsoup4 4.12.2
biopython 1.81
biothings-client 0.3.0
bleach 6.0.0
blessed 1.20.0
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 3.2.0
chex 0.1.7
click 8.1.5
cmake 3.26.4
colorama 0.4.6
comm 0.1.3
contextlib2 21.6.0
contourpy 1.1.0
croniter 1.4.1
cycler 0.11.0
dateutils 0.6.12
debugpy 1.6.7
decorator 5.1.1
deepdiff 6.3.1
defusedxml 0.7.1
diskcache 5.6.1
dm-tree 0.1.8
docrep 0.3.2
etils 1.3.0
exceptiongroup 1.1.2
executing 1.2.0
fastapi 0.100.0
fastjsonschema 2.17.1
filelock 3.12.2
flax 0.7.0
fonttools 4.41.0
fqdn 1.5.1
frozenlist 1.4.0
fsspec 2023.6.0
genomepy 0.16.1
h11 0.14.0
h5py 3.9.0
huggingface-hub 0.16.4
idna 3.4
igraph 0.10.5
importlib-resources 6.0.0
inquirer 3.1.3
ipykernel 6.24.0
ipython 8.14.0
ipython-genutils 0.2.0
ipywidgets 8.0.7
isoduration 20.11.0
itsdangerous 2.1.2
jax 0.4.13
jaxlib 0.4.13
jedi 0.18.2
Jinja2 3.1.2
joblib 1.3.1
jsonpointer 2.4
jsonschema 4.18.3
jsonschema-specifications 2023.6.1
jupyter 1.0.0
jupyter_client 8.3.0
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.6.3
jupyter_server 2.7.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.8
kiwisolver 1.4.4
leidenalg 0.10.0
lightning 2.0.5
lightning-cloud 0.5.37
lightning-utilities 0.9.0
lit 16.0.6
llvmlite 0.40.1
loguru 0.7.0
loompy 3.0.7
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.7.2
matplotlib-inline 0.1.6
mdurl 0.1.2
mistune 3.0.1
ml-collections 0.1.1
ml-dtypes 0.2.0
mpmath 1.3.0
msgpack 1.0.5
mudata 0.2.3
multidict 6.0.4
multipledispatch 1.0.0
mygene 3.2.2
mysql-connector-python 8.0.33
natsort 8.4.0
nbclassic 1.0.0
nbclient 0.8.0
nbconvert 7.6.0
nbformat 5.9.1
nest-asyncio 1.5.6
networkx 3.1
norns 0.1.6
nose 1.3.7
notebook 6.5.4
notebook_shim 0.2.3
numba 0.57.1
numpy 1.24.4
numpy-groupies 0.9.22
numpyro 0.12.1
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
opt-einsum 3.3.0
optax 0.1.5
orbax-checkpoint 0.2.7
ordered-set 4.1.0
overrides 7.3.1
packaging 23.1
pandas 2.0.3
pandocfilters 1.5.0
parso 0.8.3
patsy 0.5.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 10.0.0
pip 23.1.2
platformdirs 3.8.1
prometheus-client 0.17.1
prompt-toolkit 3.0.39
protobuf 3.20.3
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.21
pydantic 1.10.11
pyfaidx 0.7.2.1
Pygments 2.15.1
PyJWT 2.7.0
pymde 0.1.18
pynndescent 0.5.10
pyparsing 3.0.9
pyro-api 0.1.2
pyro-ppl 1.8.5
python-dateutil 2.8.2
python-editor 1.0.4
python-igraph 0.10.5
python-json-logger 2.0.7
python-multipart 0.0.6
pytorch-lightning 2.0.5
pytz 2023.3
PyYAML 6.0
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
readchar 4.0.5
referencing 0.29.1
requests 2.31.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.4.2
rpds-py 0.8.10
scanpy 1.9.3
scikit-learn 1.3.0
scikit-misc 0.3.0
scipy 1.11.1
scvi-colab 0.12.0
scvi-tools 1.0.2
seaborn 0.12.2
Send2Trash 1.8.2
session-info 1.0.0
setuptools 67.8.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
sparse 0.14.0
stack-data 0.6.2
starlette 0.27.0
starsessions 1.3.0
statsmodels 0.14.0
stdlib-list 0.9.0
sympy 1.12
tensorstore 0.1.40
terminado 0.17.1
texttable 1.6.7
threadpoolctl 3.2.0
tinycss2 1.2.1
toolz 0.12.0
torch 2.0.1
torchmetrics 1.0.1
torchvision 0.15.2
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
triton 2.0.0
typing_extensions 4.7.1
tzdata 2023.3
umap-learn 0.5.3
uri-template 1.3.0
urllib3 2.0.3
uvicorn 0.23.0
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.1
websockets 11.0.3
wheel 0.38.4
widgetsnbextension 4.0.8
xarray 2023.6.0
yarl 1.9.2
zipp 3.16.2