ValueError: Categorical categories cannot be null

Question

ValueError: Categorical categories cannot be null

wuyinbuquan97 opened this issue 2 years ago · 1 comments

我在使用自己的数据运行came时：
outputs = pipeline.main_for_unaligned(
**came_inputs,
df_varmap=df_varmap,
df_varmap_1v1=df_varmap_1v1,
dataset_names=dsnames,
key_class1=key_class1,
key_class2=key_class2,
do_normalize=True,
keep_non1v1_feats=keep_non1v1_feats,
n_epochs=n_epochs,
resdir=resdir,
n_pass=n_pass,
batch_size=batch_size,
plot_results=True,
)

出现了
already exists:
/work/home/luo_funong/shanhuiquan/came/test/res/figs
already exists:
/work/home/luo_funong/shanhuiquan/came/test/res
trans.shape= (570, 567)
[] Setting dataset names:
0-->liver_cattle
1-->liver_human
[] Setting aligned features for observation nodes (self._features)
[] Setting un-aligned features (self._ov_adjs) for making links connecting observation and variable nodes
[] Setting adjacent matrix connecting variables from these 2 datasets (self._vv_adj)
-------------------- Summary of the DGL-Heterograph --------------------
Graph(num_nodes={'cell': 8996, 'gene': 7024},
num_edges={('cell', 'express', 'gene'): 3419993, ('cell', 'self_loop_cell', 'cell'): 8996, ('cell', 'similar_to', 'cell'): 59332, ('gene', 'expressed_by', 'cell'): 3419993, ('gene', 'homolog_with', 'gene'): 12196},
metagraph=[('cell', 'gene', 'express'), ('cell', 'cell', 'self_loop_cell'), ('cell', 'cell', 'similar_to'), ('gene', 'cell', 'expressed_by'), ('gene', 'gene', 'homolog_with')])
second-order connection: False
self-loops for observation-nodes: True
self-loops for variable-nodes: True

DataPair with 8996 obs- and 7024 var-nodes
obs1 x var1 (liver_cattle): 3989 x 3196
obs2 x var2 (liver_human): 5007 x 3828
Dimensions of the obs-node-features: 570
/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/sklearn/utils/validation.py:727: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/sklearn/utils/validation.py:727: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
Traceback (most recent call last):
File "", line 1, in
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/came/pipeline.py", line 436, in main_for_unaligned
ENV_VARs = prepare4train(dpair, key_class=keys, batch_keys=batch_keys,)
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/came/utils/train.py", line 107, in prepare4train
labels, classes = dpair.get_obs_labels(
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/came/datapair/unaligned.py", line 385, in get_obs_labels
labels_cat = pd.Categorical(list(labels_12[0]) + list(labels_12[1]),
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 378, in init
dtype = CategoricalDtype._from_values_or_dtype(
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 299, in _from_values_or_dtype
dtype = CategoricalDtype(categories, ordered)
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 186, in init
self._finalize(categories, ordered, fastpath=False)
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 340, in _finalize
categories = self.validate_categories(categories, fastpath=fastpath)
File "/work/home/luo_funong/miniconda3/envs/env_came/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 534, in validate_categories
raise ValueError("Categorical categories cannot be null")
ValueError: Categorical categories cannot be null
请问怎么解决

Answer 1 · 2023-06-12T01:46:14.000Z

有可能是你的参考数据的类型标注存在缺失值，建议先过滤掉标签缺失的细胞，再重新运行CAME的流程：

# 检查是否有标签缺失
cate_counts = adata_raw1.obs[key_class1].value_counts(dropna=False)
print(cate_counts)