koursaros-ai/nboost

CPU based TF error

Closed this issue · 7 comments

0mars commented

I'm getting this error while starting, I have changed the docker image since I don't have GPU on my server, but still it complains about a missing libcuda

I've tried to build the image, and also tried tensorflow/tensorflow without gpu tag (tensorflow/tensorflow:1.15.0-py3)

nboost_1                 | C:BertModel:[pro:run:273]:Upstream host is data.humanoyd.com:80
nboost_1                 | 2019-12-10 06:30:49.581369: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
nboost_1                 | 2019-12-10 06:30:49.619399: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2294605000 Hz
nboost_1                 | 2019-12-10 06:30:49.622090: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f3a75d30d10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
nboost_1                 | 2019-12-10 06:30:49.622150: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
nboost_1                 | 2019-12-10 06:30:49.629999: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
nboost_1                 | 2019-12-10 06:30:49.630250: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
nboost_1                 | 2019-12-10 06:30:49.630376: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (a9720d7c87ae): /proc/driver/nvidia/version does not exist
nboost_1                 | 2019-12-10 06:30:50.895906: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 93763584 exceeds 10% of system memory.

Is this related to dependency of nboost or what do I do, any hints are appreciated. I imagine a lot of people would like to use CPU since it's more of a server thing and not a lot of providers support GPUs

This logging output is standard when tensorflow 1 is running ins a non-GPU environment - it does not mean it's broken.

If it hangs there may be another issue e.g. out of memory error (which sometimes tensorflow doesn't report 👎).

We are in the process of implementing the models in pytorch which will get rid of these confusing errors, and additionally publishing a tiny distilled version of the model that is fast and small enough to run without GPU. Should be done within a couple days - to a week.

0mars commented

@pertschuk thanks for all the efforts you put into this :) , keep us posted

Just released the new package that uses pytorch (much cleaner) by default. The new default model is pt-tinybert-msmarco (Pyorch Tiny BERT) which is about six times as fast and works well without a GPU. Added benchmarks also.

0mars commented

hi, I used this image: koursaros/nboost:latest-torch
and having it complain about missing tensorflow

nboost_1                 | Traceback (most recent call last):
nboost_1                 |   File "/opt/conda/bin/nboost", line 8, in <module>
nboost_1                 |     sys.exit(main())
nboost_1                 |   File "/opt/conda/lib/python3.6/site-packages/nboost/__main__.py", line 7, in main
nboost_1                 |     proxy = create_proxy()
nboost_1                 |   File "/opt/conda/lib/python3.6/site-packages/nboost/cli/__init__.py", line 68, in create_proxy
nboost_1                 |     args = parser.parse_args(argv)
nboost_1                 |   File "/opt/conda/lib/python3.6/argparse.py", line 1734, in parse_args
nboost_1                 |     args, argv = self.parse_known_args(args, namespace)
nboost_1                 |   File "/opt/conda/lib/python3.6/argparse.py", line 1766, in parse_known_args
nboost_1                 |     namespace, args = self._parse_known_args(args, namespace)
nboost_1                 |   File "/opt/conda/lib/python3.6/argparse.py", line 1997, in _parse_known_args
nboost_1                 |     self._get_value(action, action.default))
nboost_1                 |   File "/opt/conda/lib/python3.6/argparse.py", line 2294, in _get_value
nboost_1                 |     result = type_func(arg_string)
nboost_1                 |   File "/opt/conda/lib/python3.6/site-packages/nboost/cli/__init__.py", line 50, in <lambda>
nboost_1                 |     parser.add_argument('--model', type=lambda x: import_class('model', x), default='BertModel', help=MODEL)
nboost_1                 |   File "/opt/conda/lib/python3.6/site-packages/nboost/cli/__init__.py", line 62, in import_class
nboost_1                 |     return getattr(importlib.import_module(file), name)
nboost_1                 |   File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
nboost_1                 |     return _bootstrap._gcd_import(name[level:], package, level)
nboost_1                 |   File "<frozen importlib._bootstrap>", line 994, in _gcd_import
nboost_1                 |   File "<frozen importlib._bootstrap>", line 971, in _find_and_load
nboost_1                 |   File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
nboost_1                 |   File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
nboost_1                 |   File "<frozen importlib._bootstrap_external>", line 678, in exec_module
nboost_1                 |   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
nboost_1                 |   File "/opt/conda/lib/python3.6/site-packages/nboost/model/bert_model/__init__.py", line 3, in <module>
nboost_1                 |     import tensorflow as tf
nboost_1                 | ModuleNotFoundError: No module named 'tensorflow'

Hi, we changed the torch image to koursaros/nboost:latest-pt. Sorry about that

hmm first public tinybert implementation I saw, do you consider realeasing (code to create) more of those (other languages / corpora) in their own right? extremely handy.