Bug report : obs columns with nan cause VIP error
yyoshiaki opened this issue · 2 comments
Hi,
I noticed that when obs include nan, VIP cannot recognize cells appropriately as in the image attached.
ex.
adata.obs['IR_VJ_1_c_call']
index
AAACCTGAGACAAGCC-1-0 IGKC
AAACCTGAGAGTCGGT-1-0 IGKC
AAACCTGAGCACACAG-1-0 NaN
AAACCTGAGGAATCGC-1-0 IGLC1
AAACCTGAGTGAAGAG-1-0 IGKC
...
TTTGTCAGTTAAGAAC-1-11 IGLC2
TTTGTCATCAAACCGT-1-11 IGLC2
TTTGTCATCAACACCA-1-11 IGLC2
TTTGTCATCCCAAGTA-1-11 NaN
TTTGTCATCCGAGCCA-1-11 IGLC2
Name: IR_VJ_1_c_call, Length: 78638, dtype: category
Categories (5, object): ['IGKC', 'IGLC1', 'IGLC2', 'IGLC3', 'IGLC7']
To replace nan into str, converting obs columns into str and again into category solved the error.
adata_cg = adata.raw.to_adata()
for c in adata_cg.obs.columns:
if adata_cg.obs[c].isna().sum() > 0:
adata_cg.obs[c] = adata_cg.obs[c].astype(str).astype("category")
adata_cg.write(results_file_cellxgene)
Though this would be a rare case, I reported it because the bug fix can improve cellxgene_VIP.
best,
Yoshi
I'm thinking this PR, #65, might solve your issue. But I guess will need the maintainer to confirm my changes are valid or not.
Hi @yyoshiaki, thanks for the report, and @michaeleekk for the proposed fix.
We currently don't allow "NaN" or "Null" in the obs categorical annotation, they have consequences on "Abbr. & Combine". Thus, we currently suggest that you change the "NaN" or "Null" to "NA" in your obs.
we will add a message if we detect the "NaN" or "Null" ('undefined'in js).
Please reopen this, if you still encounter a problem after change it to "NA".