how to list values for query filter ?
Closed this issue · 2 comments
HugoCornu commented
For exemple, is it possible to list all the possible diseases or cell types that can be used in the obs_value_filter ? Ideally in python ?
johnkerl commented
@HugoCornu we can put up a sample notebook -- for the moment though here's a snippet. The gist is, read as Pandas and use .groupby()
.
def count_obs(exp: tiledbsoma.Experiment, attr_name: str) -> None:
print(
exp.obs.read(column_names=[attr_name])
.concat()
.to_pandas()
.groupby(attr_name)
.size()
.sort_values()
)
>>> exp = tiledbsoma.Experiment.open(your_uri)
>>> exp.obs.schema
soma_joinid: int64 not null
obs_id: large_string not null
...
cell_type: large_string not null
...
>>> count_obs(exp, 'cell_type')
cell_type
enteroendocrine cell of small intestine 18
paneth cell of epithelium of small intestine 34
transit amplifying cell of small intestine 53
smooth muscle fiber of ileum 54
mast cell 92
endothelial cell of lymphatic vessel 96
pericyte cell 121
glial cell 175
ileal goblet cell 230
progenitor cell 382
endothelial cell 565
fibroblast 571
enterocyte of epithelium proper of ileum 809
innate lymphoid cell 1382
mononuclear phagocyte 1635
B cell 3183
plasma cell 3898
native cell 4203
alpha-beta T cell 14957
dtype: int64
HugoCornu commented
Thanks for the answer and code !
I ll try it on cellxgene (~35 millions cells)
I was hoping for a solution that does not download to much lines.