Error in "wasserstein_distance" function
MartaBenegas opened this issue · 9 comments
Hi CellBLAST team!
I'm new at using your python package and I encountered some problems. I've download the "Chen" reference panel from your website to perform a first test and I've performed the following steps in the python interpreter:
>>> import numpy as np
>>> import pandas as pd
>>> import tensorflow as tf
>>> import Cell_BLAST as cb
>>> reference = cb.data.ExprDataSet.read_dataset("/home/biobam/Downloads/Chen.h5")
>>> models = []
>>> for i in range(4):
>>> models.append(cb.directi.fit_DIRECTi(reference, random_seed = i))
>>> blastdb = cb.blast.BLAST(models, reference)
And the last step raises this error, which I don't know how to solve:
>>> blastdb = cb.blast.BLAST(models, reference)
[INFO] Cell BLAST: Projecting to latent space...
[INFO] Cell BLAST: Fitting nearest neighbor trees...
[INFO] Cell BLAST: Sampling from posteriors...
[INFO] Cell BLAST: Generating empirical null distributions...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 473, in __init__
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 615, in _force_components
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 602, in _get_empirical
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 414, in _compile_for_args
File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 357, in error_rewrite
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function wasserstein_distance at 0x7f91740bbd90>) found for signature:
>>> wasserstein_distance(array(float32, 1d, C), array(float32, 1d, C))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function '_wasserstein_distance': File: Cell_BLAST/blast.py: Line 0.
With argument(s): '(array(float32, 1d, C), array(float32, 1d, C))':
Rejected as the implementation raised a specific error:
RuntimeError: cannot cache function '_wasserstein_distance_impl': no locator available for file '/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py'
raised from /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/caching.py:352
During: resolving callee type: Function(<function wasserstein_distance at 0x7f91740bbd90>)
During: typing of call at /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py (209)
File "anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 209:
<source missing, REPL/exec in use?>
I'm running the python interpreter on the conda environment created for CellBLAST following the instructions in the installation guide:
(cb) biobam@biobam-500-526ns:~$ python3
Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Thank you in advance!
Thanks for the report! Could you please provide the specific versions of numba, scipy and numba in your conda environment? They can be found via conda list | egrep '(numba|scipy|numba)'
.
Hi, you've said two times numba, maybe you were referring to pandas or tensorflow? Anyway, here are the versions of these two packages as well, just in case:
(base) biobam@biobam-500-526ns:~$ conda activate cb
(cb) biobam@biobam-500-526ns:~$ conda list | egrep '(numba|scipy|tensorflow|pandas)'
numba 0.52.0 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
tensorflow 1.8.0 pypi_0 pypi
Thanks for the clarification and sorry for the typo... I meant numpy, but the most probable cause should be numba and scipy.
I have tried creating a new environment with the same versions of numba, scipy, pandas and tensorflow, and ran the same lines of code on Chen.h5 data, but I cannot reproduce the error.
Could you please provide the full conda list
output so I may check for other differences?
Of course! Here it is:
biobam@biobam-500-526ns:~$ conda activate cb
(cb) biobam@biobam-500-526ns:~$ conda list
# packages in environment at /home/biobam/anaconda3/envs/cb:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.11.0 pypi_0 pypi
anndata 0.7.5 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
bleach 1.5.0 pypi_0 pypi
ca-certificates 2020.12.8 h06a4308_0
cached-property 1.5.2 pypi_0 pypi
cell-blast 0.3.8 pypi_0 pypi
certifi 2020.12.5 py36h06a4308_0
chardet 3.0.4 pypi_0 pypi
click 7.1.2 pypi_0 pypi
cycler 0.10.0 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
fastobo 0.9.3 pypi_0 pypi
gast 0.4.0 pypi_0 pypi
grpcio 1.34.1 pypi_0 pypi
h5py 3.1.0 pypi_0 pypi
html5lib 0.9999999 pypi_0 pypi
importlib-metadata 3.4.0 pypi_0 pypi
joblib 1.0.0 pypi_0 pypi
kiwisolver 1.3.1 pypi_0 pypi
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
llvmlite 0.35.0 pypi_0 pypi
loompy 3.0.6 pypi_0 pypi
markdown 3.3.3 pypi_0 pypi
matplotlib 3.3.3 pypi_0 pypi
natsort 7.1.0 pypi_0 pypi
ncurses 6.2 he6710b0_1
networkx 2.5 pypi_0 pypi
numba 0.52.0 pypi_0 pypi
numpy 1.19.5 pypi_0 pypi
numpy-groupies 0.9.13 pypi_0 pypi
openssl 1.1.1i h27cfd23_0
packaging 20.8 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
patsy 0.5.1 pypi_0 pypi
pillow 8.1.0 pypi_0 pypi
pip 20.3.3 py36h06a4308_0
plotly 4.14.3 pypi_0 pypi
pronto 2.3.2 pypi_0 pypi
protobuf 3.14.0 pypi_0 pypi
pynndescent 0.5.1 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
python 3.6.12 hcff3b4d_2
python-dateutil 2.8.1 pypi_0 pypi
python-igraph 0.8.3 pypi_0 pypi
pytz 2020.5 pypi_0 pypi
readline 8.0 h7b6447c_0
retrying 1.3.3 pypi_0 pypi
scikit-learn 0.24.0 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
seaborn 0.11.1 pypi_0 pypi
setuptools 51.1.2 py36h06a4308_4
six 1.15.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
statsmodels 0.12.1 pypi_0 pypi
tensorboard 1.8.0 pypi_0 pypi
tensorflow 1.8.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
texttable 1.6.3 pypi_0 pypi
threadpoolctl 2.1.0 pypi_0 pypi
tk 8.6.10 hbc83047_0
tqdm 4.56.0 pypi_0 pypi
typing-extensions 3.7.4.3 pypi_0 pypi
umap-learn 0.5.0 pypi_0 pypi
werkzeug 1.0.1 pypi_0 pypi
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zipp 3.4.0 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
Thanks! I built an exact same environment but still could not reproduce the error... Maybe it's because of some external system libraries that the pip-installed packages depend on? In that case, you may try installing all dependencies via conda and use pip only to install Cell BLAST.
Here's a yaml file for environment configuration: debug.yml.gz.
You can build the environment (assuming environment name is "debug") via:
gunzip debug.yml.gz
conda env create -n debug -f debug.yml
conda activate debug
pip install cell-blast
I have verified that this configuration works at least on the machine I'm using. Hope that helps.
Btw, for quicker testing, you can use cb.directi.fit_DIRECTi(reference, random_seed=i, epoch=1)
.
Just found this: librosa/librosa#1156
They seem to suggest setting NUMBA_CACHE_DIR fixes this problem.
import os
os.environ[ 'NUMBA_CACHE_DIR' ] = '/tmp/' # Or some other writable directory
Both solutions worked for me, thank you!
Maybe I know where the problem is. I tried to follow the installation guide:
(debug) biobam@biobam-500-526ns:~/Downloads$ conda create -n cbtest python=3.6 && source activate cbtest
[...]
(debug) biobam@biobam-500-526ns:~/Downloads$ conda activate cbtest
But when I tried to install tensorflow as you specified I encountered this problem:
(cbtest) biobam@biobam-500-526ns:~/Downloads$ conda install tensorflow=1.8
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
So I installed it without the version info, which installs by default the latest version (2.2.0): (cbtest) biobam@biobam-500-526ns:~/Downloads$ conda install tensorflow
And finally: (cbtest) biobam@biobam-500-526ns:~/Downloads$ pip install Cell-BLAST
However, when I tried to start the analysis it raised the following error:
(cbtest) biobam@biobam-500-526ns:~/Downloads$ python3
Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> import tensorflow as tf
>>> import Cell_BLAST as cb
OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/__init__.py", line 11, in <module>
from . import (blast, config, data, directi, latent, metrics, prob, rmbatch,
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 19, in <module>
from . import config, data, directi, metrics, utils
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/directi.py", line 16, in <module>
from . import config, data, latent, model, prob, rmbatch, utils
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/latent.py", line 13, in <module>
from . import module, nn, utils
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/module.py", line 15, in <module>
class Module(object):
File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/module.py", line 32, in Module
def _save_weights(self, sess: tf.Session, path: str) -> None:
AttributeError: module 'tensorflow' has no attribute 'Session'
And that's because tensorflow versions higher than 2 doesn't use the attribute 'Session' anymore.
So, to solve this I installed tensorflow with pip install tensorflow==1.8
instead.
Maybe it's because of some external system libraries that the pip-installed packages depend on?
I don't know if this difference is causing this problem that you are mentioning, but anyways I though that maybe you would like to know this issue. Here you have the whole log if you want to take a look:
log.txt
Hope it helps!
Marta.
I just tried installing tensorflow 1.8 as well, and it's indeed no longer working. Tensorflow 1.8 was the version I used during development, which is a bit too old now. Maybe conda removed some dependencies from their default channel... Yes tensorflow 2.x won't work because of many backward incompatible changes. But later versions of tensorflow 1.x (e.g., 1.12) work fine.
Nevertheless, I don't see why the tensorflow installation breaks numba caching...
Anyway, thanks a lot for the elaboration! I will update the installation guide with a newer version of tensorflow 1.x that still installs from conda.
Hi! I encountered similar errors when run cb.blast.BLAST(models, adata)
. I create conda environment just as what it is in https://cblast.readthedocs.io/en/latest/BLAST.html. And it seems not the problems about version of python packages.