Teichlab/celltypist

Running `celltypist.annotate` with `min_prop` can't create "Heterogeneous" category

DanScarc opened this issue · 2 comments

Description

Please find a minimal example reproducing the error below.

Example

# Library imports
import scanpy as sc 
import celltypist # v. 1.6.1
from celltypist import models

# Data loading
adata = sc.datasets.pbmc3k()

# Adapt adata for compatibility with celltypist
adata_celltypist = adata.copy()
sc.pp.normalize_per_cell(
    adata_celltypist, counts_per_cell_after=10**4
)
sc.pp.log1p(adata_celltypist)
adata_celltypist.X = adata_celltypist.X.toarray()

# Dowload celltypist models
models.download_models(
    force_update=True, model=["Immune_All_Low.pkl"]
)
model_low = models.Model.load(model="Immune_All_Low.pkl")

# Predict cell types
predictions_low = celltypist.annotate(
    adata_celltypist, model=model_low, majority_voting=True, mode="best match", min_prop=0.7
)

Returns

File ~/miniforge3/envs/preprocessing/lib/python3.9/site-packages/celltypist/classifier.py:473, in Classifier.majority_vote(predictions, over_clustering, min_prop)
    471 majority = votes.idxmax(axis=0)
    472 freqs = (votes / votes.sum(axis=0).values).max(axis=0)
--> 473 majority[freqs < min_prop] = 'Heterogeneous'
    474 majority = majority[over_clustering].reset_index()
    475 majority.index = predictions.predicted_labels.index
.
.
.

TypeError: Cannot setitem on a Categorical with a new category (Heterogeneous), set the categories first

Environment

My current environment is:

name: preprocessing
channels:
  - bioconda
  - conda-forge
dependencies:
  - conda-forge::jupyterlab=3.5.0
  - conda-forge::leidenalg=0.9.1
  - conda-forge::numba=0.56.4
  - conda-forge::joypy
  - conda-forge::python=3.9.15
  - conda-forge::r-base=4.1.3
  - conda-forge::r-soupx=1.6.1
  - conda-forge::r-sctransform=0.3.3
  - conda-forge::r-glmpca=0.2.0
  - conda-forge::rpy2=3.5.11
  - conda-forge::scanpy=1.9.3
  - conda-forge::session-info=1.0.0
  - bioconda::celltypist
  - bioconda::anndata2ri=1.1
  - bioconda::bioconductor-scdblfinder=1.8.0
  - bioconda::bioconductor-scry=1.6.0
  - bioconda::bioconductor-scran=1.22.1
  - bioconda::bioconductor-glmgampoi=1.6.0

Thank you in advance!

@DanScarc, this should be caused by the new behavior of new versions of pandas that make the output of idxmax as categorical. You can try to downgrade your version of pandas, or use the newest version of celltypist (1.6.2) which should have fixed this issue.

This should have been fixed. Please reopen the issue if you have further questions.