gao-lab/Cell_BLAST

reconcile_models() problems

Closed this issue · 14 comments

hi @Jeff1995 , I run the code
data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05)
and get error below:
IndexError Traceback (most recent call last)
in
----> 1 data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05)

/usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in reconcile_models(self, dist_method, pval_method)
996 """
997 dist_method = self._get_reconcile_method(dist_method)
--> 998 dist = [dist_method(item, axis=1) for item in self.dist]
999 pval_method = self._get_reconcile_method(pval_method)
1000 pval = [pval_method(item, axis=1) for item in self.pval]

/usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in (.0)
996 """
997 dist_method = self._get_reconcile_method(dist_method)
--> 998 dist = [dist_method(item, axis=1) for item in self.dist]
999 pval_method = self._get_reconcile_method(pval_method)
1000 pval = [pval_method(item, axis=1) for item in self.pval]

<array_function internals> in mean(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims)
3255
3256 return _methods._mean(a, axis=axis, dtype=dtype,
-> 3257 out=out, **kwargs)
3258
3259

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
136
137 is_float16_result = False
--> 138 rcount = _count_reduce_items(arr, axis)
139 # Make this warning show up first
140 if rcount == 0:

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _count_reduce_items(arr, axis)
55 items = 1
56 for ax in axis:
---> 57 items *= arr.shape[ax]
58 return items
59

IndexError: tuple index out of range

no idea how to fix

Thanks for the report!

It's also not immediately clear to me why this happens. Could you please run the following lines before .reconcile_models() to save these objects as pickle files and post it here? That may help track down the problem. Thanks!

import pickle
with open("debug_hits_dist.pkl", "wb") as f:
    pickle.dump(data_obj2_hits.dist, f)
with open("debug_hits_pval.pkl", "wb") as f:
    pickle.dump(data_obj2_hits.pval, f)

@Jeff1995 Ok here are the two pkl files in the debug.zip
debug.zip

Seems that I can not reproduce the error under numpy 1.14.6. I suspect it's a numpy version issue. What numpy version are you using?

I use numpy 1.17.2

I think I figured it out. It was not a numpy version problem, but rather because only one DIRECTi model was used in BLAST. In that case the singleton "model" dimension (axis=1) in the hist.dist array was missing, so taking the mean over axis=1 referred to a non-existent axis.

If only one model was used, .reconcile_models() is not necessary. You can just remove .reconcile_models() and continue with downstream steps.

Meanwhile, .reconcile_models() should also work even if only one model was used (just does nothing). It will be fixed in a future release.

But I used 4 models before: code below:

models = []
for i in range(4):
models.append(cb.directi.fit_DIRECTi(
data_obj, genes=selected_genes,
latent_dim=10, cat_dim=20, random_seed=i
))
blast = cb.blast.BLAST(models, data_obj)
data_obj2_hits = blast.query(data_obj2)
data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05) error here

Well, that would be strange... Can you confirm that the data_obj2_hits.reconcile_models() line was not executed more than once? If that is the case, could you provide the data_obj object (as an h5 file), and the selected_genes object (as a text file), so I can try to reproduce the error.

I do not use selected_genes, axes = data_obj.find_variable_genes() to produce the gene list; I use the HV gene finded before ,does it cause this problem? How could I save the data_obj , it is created by data_obj = cb.data.ExprDataSet(exprs=adata.X, obs=adata.obs, var=adata.var, uns=adata.uns)

I think the gene list shouldn't be the cause. You can save the data_obj with data_obj.write_dataset("filename.h5").

I tried on this data (using the training data data_obj as query since I do not have data_obj2), but I could not reproduce the error using the following script:

import pandas as pd
import Cell_BLAST as cb

data_obj = cb.data.ExprDataSet.read_dataset("data1.h5")
selected_genes = pd.read_csv("gene.csv", index_col=0).to_numpy().ravel().tolist()

models = []
for i in range(4):
    models.append(cb.directi.fit_DIRECTi(
    data_obj, genes=selected_genes,
    latent_dim=10, cat_dim=20, random_seed=i
))

blast = cb.blast.BLAST(models, data_obj)
data_obj_hits = blast.query(data_obj)
data_obj_hits = data_obj_hits.reconcile_models().filter(by="pval", cutoff=0.05)

print("Done!")

Could you please try running this as a Python script (not as a Jupyter notebook) and see if it works on your side?

If the error persists, it would most likely be an environment issue. You may need to provide your detailed environment specification (via conda env export) so I can try to reproduce it.

@Jeff1995 It work as the script,but still fail in Jupyter notebook

Okay. I think the most likely cause is that you ran the following line more than once in the Jupyter notebook:

data_obj_hits = data_obj_hits.reconcile_models().filter(by="pval", cutoff=0.05)

It should be run only once. If you run it a second time it will produce the error.

Thanks!